All this does not seem related to the use or declaration of variable "c", but only to the way "l_getc(f)" is declared (and internally rempped to getc_unlocked when some extra macros with unsafe side effects, which does not occur with a classic "getc(f)" macro from the native standard library headers). So l_getc() seems to be broken if it's declared internally as a macro using additional internal local temporary variables with unsafe declarations).
Note that there's also the possibly effect of "Spectre" mitigation workaround, if there's an internal condition on buffer length before reading from the internal buffer, or calling I/O to refill the buffer. Recent C/C++ compilers now have the "Spectre" mitigation implemented (for x86_32,
x86_64/x64/amd64,
arm32 and arm64, possibly for ia64 as well as some other RISC
SMT
architectures using branch prediction with pipelining on multilevel caches), where an additional intrinsic "fence" instruction may be inserted in the conditional branch using indexed access to the buffer to avoid modifying or reading from an external cache-line for a predicted branch that is still not asserted by completion of the prior test.
The problem of these mitigations is that they sometime reduce a lot some operations as the "fences" will force pipelines to stale with waiting cycles, notably on modern RISC processors that have tall pipelines. And if the CPU has no microcode update, it's up to the software to insert the fences: some compilers will try to "guess" where they are needed but they have no real clue about the isolation levels and if they are needed; and the programmers as well can frequently forget many of them. At least modern compilers can now emit warnings for places that may be sensitive, and ropose a way for pgrammers to explicitly put a fence.
A simple standard library API like "strncpy" is concerned by Spectre (but not "memcpy") and your example is also one candidate as there's a branch misprediction when "i" reaches "LUAL_BUFFERSIZE", so that "l_getc(f)" may still attempt to access past the end of buffer, before the assertion "i<LUAL_BUFFERSIZE" is checked. One way to avoid it would be to use allocate a larger buffer size than
"LUAL_BUFFERSIZE" (e.g. add a couple of words, or as may words as needed by the compiler trying to optimize your loop by unrolling it partly) and fill the extra with zeroes, without marking this extra size with any read/write protection or any condition that could cause the uncorrect prediction to start flusing a cache line or start anticipating an exception.
Note that ARM does not have a native "fence" instruction, the workaround is much more costly, however it locks natively (with wait cycles) much sooner than Intel which uses much taller pipelines and anticipates the execution in predicted branches (and this is more critical in Xeon/E3 processors that have very large caches for branch predictions and massively uses speculated execution to feed the stage of its piplelines without locking one of them with wait cycles, or on all CPU/GPU/APU/SPU processors allowing multiple threads per core because of the large width of their internal scheduler of microps and the presence of more execution units but with different types). Similar issues also exist in chipsets (notably in north bridges, fast PCI bridges, and external ports like Firewire, USB 3 or higher, Wifi AC, Gigabit Ethernet and fiber links: here also the cache eviction policy is very weak and is only optimized for global performance, not for security)
----
Since Spectre and Metdown, there are now tons of many new similar issues found in SMT systems, notably "PortMash" which does not concern the memory cache but the cache of the micrinstruction scheduler in processors using conversion from an ISA to a very large microinstruction width over a large internal bug, in order to allocate and scedule the execution units. This also occurs in GPUs that have a complex internal scheduler with many ports for execution units, plus more specialized ports, depending oin the kind of fectorisation performed by the GPU scheduler. GPUs also need their own microcode patch now that they are used for OS-critical features like OpenCL, CUDA, DirectCompute, Intel Security (TXT, TDT), Microsoft Defender... notably for integrated GPUs which are typically unused on servers for display, and where display is preferably externalized in discrete boards or remotely on other servers).
All these issues are caused by the difficulty to manage the branch prediction and ensure coherence of caches (that are everywhere in all modern CPUs, but as well on network services like DNS, DHCP, RDP, and on intermediate routers that generally use simplist cache-eviction policies without strong separation between the many clients operating in parallell on the same shared resources, but with different security realms: this is a very hard problem now for cloud providers that want to scale their platform to support more clients at lower prices)...