Re: -fsanitize=memory

microcode fixes in CPU can only work on the L1 caches, it does not work at all on any other caches and in external caches (including application caches in user-mode code, bus registers/latches and fast adapters, and caches in external routers/gateways/file servers, remove application servers using for example REST API)...

There are MANY caches everywhere and the solution to fix them is not in the solicit or microcode, but in the software itself (the hypervisor, the OS, the drivers, the applications, the remote services, the various backend servers). Even an SQL database or a filesystem can be attacked by time-based side channels.

All caching designs should include their own cache eviction policy and allow segregating caching levels according to the security profile of their clients they want to isolate and from the inner service itself which much be protected and not attackable by any client. This requires extendded "tagging" for each usage, and slow and randomized recollection of unused tagged areas that can be later reused in *really unpredictable* time by other applications/clients/services.

But adding tags means that you have to secure the quotas allowed for use by anyoine and making sure that no one (not even the hypervisor) can alter the quota assigned to the other. This means that the cache may have to contain "duplicate" entries for the same data, but with different tags and different eviction policies.

And this can be a severe problem for application servers that need to service many clients. Even a search engine like Google servicing millions requests each second cannot create as many segrated caching areas for each client, without forcing all of them to have an extremely low quota of use: and then the caches will be experimenting very high level of cache mises for everyone, while having many segregated parts still in idle state (but kept in their state for very long time to protect them from third party attacks).

If this is implemented, then Google will have to sell services with warrantied response time/performance and available resources. And this service cannot work with the "free/no-cost" model just paid by advertizing. The Google farms would also not be enough to support the load: centralized artcihtecures and clouds in general will no longer work, unless Google starts selling services using distributed peer-to-peer devices (its client buy a specific device that will be for their exclusive use and will be deployed in their own premises, or will be rented at high prices in the few colocation areas, where the client will pay also the servicing, the energy used for a device that will be idle most of the time but not reusable by any other Google client).

Dedicated servers are for now the only solutions for serious web servers, and they should have their own hardware, own memory, own storage, own backup solutions, own firewall... This will be much mroe costly than existing "cloud hosting" proposed now.

If customers need to deploy their own device (sold and preconfigured by Google for immediate use) or a colocation area to have it installed there, the billing won't be the same and there will be scarce resources for such sales (many more colocation areas will have to be deployed around the world, Google will need to employ much more people to service them, it may be good however for job offers!). But energy efficiency gains will be lost except for very big organizations that will want to connect their own private "supercomputer", will probably won't need Google as a third party, on a device on which Google will not even be allowed to collect user profiles or distribute advertizing).

"Spectre" for me means this is the end (agin) of centralized computing, its reintroduction as "clouds" instead of the former "mainframes" was a myth, it won't survive long. And probably the whole concept of Internet (the way we know it today) is dead. and may be it's a good time to reintroduce "true-life" socialisation, with direct man-to-man interactions and limited delegations of trusts to small circles (do you remember that Google introduced "circles" than decided to kill it by first tweaking it for advertizers, then closing Google+ because of third party abuses?)

And any way we need to reinforce the privacy rules: the RGPD was just a phase 1. We must go further by intriducing proof of delegation and forbidding transitive delegations of trust. Anonymity can be preserved outside the allowed "circles", but circles cannot work without proof of identity and circles must not be freely extensible by any one of its members, cicles have to become private for each user. Intruders not allowed, means the end of the "open" internet (in reality only open to big data players choosing their own contracts unilaterally without giving choice).

Now look at what Intel does : it proposes OSS solutions, but then further restricts them with exclusive patenting rights, so much that it does not even allow users to publish any discovery of what Intel made bad, or publishing any benchmark for Intel services. They are building a legal wall of lies allowing Intel to say and sell want they want without any form of liability (Intel just says "buy it, if it breaks, it's your fault, and if it breaks my Intel service, you'll have to pay me unlimited fees for damages, and if you tell about this Intel will seize you, including your data, and your OSS licence will be voided as well, and you'll have to pay Intel for any other third-partyy to whom you've distributed the OSS solution, and Intel will also prosecute them to force them to pay Intel").

Intel does all that because it knows that it is in severe troubles, and may loose the commercial battle rapidly face to AMD, or Chinese, Korean and Russian foundries. Intel would have no other choice than abandonning the x86 architecture and convert itself to ARM only or buy licences to AMD or Chinese foundries... but for now, as there are issues as well in AMD and ARM, Intel thinks it can resist (but this won't be long before there's a huge attack, and I'm convinced that Google, Facebook, Apple, IBM and Samsung are already working to build their own architecture using a very different paradigm (we know that Google and IBM are working on quantic computers, others may be working on peer-to-peer distributed architectures notably Amazon for its B2C sales and will implement their own P2P networking protocol).

Le dim. 24 nov. 2019 à 01:48, rarchimedes <rettarchimedes@gmail.com> a écrit :

If one depends on compilers and/or programmers to consistently and correctly work around these processor holes, failure is almost certain. However slowing they may be, microcode fixes are the only real safety.

Everett L Williams

On Sat, Nov 23, 2019, 18:12 Philippe Verdy <verdy_p@wanadoo.fr> wrote:
All this does not seem related to the use or declaration of variable "c", but only to the way "l_getc(f)" is declared (and internally rempped to getc_unlocked when some extra macros with unsafe side effects, which does not occur with a classic "getc(f)" macro from the native standard library headers). So l_getc() seems to be broken if it's declared internally as a macro using additional internal local temporary variables with unsafe declarations).

Note that there's also the possibly effect of "Spectre" mitigation workaround, if there's an internal condition on buffer length before reading from the internal buffer, or calling I/O to refill the buffer. Recent C/C++ compilers now have the "Spectre" mitigation implemented (for x86_32, x86_64/x64/amd64, arm32 and arm64, possibly for ia64 as well as some other RISC SMT architectures using branch prediction with pipelining on multilevel caches), where an additional intrinsic "fence" instruction may be inserted in the conditional branch using indexed access to the buffer to avoid modifying or reading from an external cache-line for a predicted branch that is still not asserted by completion of the prior test.

The problem of these mitigations is that they sometime reduce a lot some operations as the "fences" will force pipelines to stale with waiting cycles, notably on modern RISC processors that have tall pipelines. And if the CPU has no microcode update, it's up to the software to insert the fences: some compilers will try to "guess" where they are needed but they have no real clue about the isolation levels and if they are needed; and the programmers as well can frequently forget many of them. At least modern compilers can now emit warnings for places that may be sensitive, and ropose a way for pgrammers to explicitly put a fence.

A simple standard library API like "strncpy" is concerned by Spectre (but not "memcpy") and your example is also one candidate as there's a branch misprediction when "i" reaches "LUAL_BUFFERSIZE", so that "l_getc(f)" may still attempt to access past the end of buffer, before the assertion "i<LUAL_BUFFERSIZE" is checked. One way to avoid it would be to use allocate a larger buffer size than "LUAL_BUFFERSIZE" (e.g. add a couple of words, or as may words as needed by the compiler trying to optimize your loop by unrolling it partly) and fill the extra with zeroes, without marking this extra size with any read/write protection or any condition that could cause the uncorrect prediction to start flusing a cache line or start anticipating an exception.

Note that ARM does not have a native "fence" instruction, the workaround is much more costly, however it locks natively (with wait cycles) much sooner than Intel which uses much taller pipelines and anticipates the execution in predicted branches (and this is more critical in Xeon/E3 processors that have very large caches for branch predictions and massively uses speculated execution to feed the stage of its piplelines without locking one of them with wait cycles, or on all CPU/GPU/APU/SPU processors allowing multiple threads per core because of the large width of their internal scheduler of microps and the presence of more execution units but with different types). Similar issues also exist in chipsets (notably in north bridges, fast PCI bridges, and external ports like Firewire, USB 3 or higher, Wifi AC, Gigabit Ethernet and fiber links: here also the cache eviction policy is very weak and is only optimized for global performance, not for security)

----
Since Spectre and Metdown, there are now tons of many new similar issues found in SMT systems, notably "PortMash" which does not concern the memory cache but the cache of the micrinstruction scheduler in processors using conversion from an ISA to a very large microinstruction width over a large internal bug, in order to allocate and scedule the execution units. This also occurs in GPUs that have a complex internal scheduler with many ports for execution units, plus more specialized ports, depending oin the kind of fectorisation performed by the GPU scheduler. GPUs also need their own microcode patch now that they are used for OS-critical features like OpenCL, CUDA, DirectCompute, Intel Security (TXT, TDT), Microsoft Defender... notably for integrated GPUs which are typically unused on servers for display, and where display is preferably externalized in discrete boards or remotely on other servers).

All these issues are caused by the difficulty to manage the branch prediction and ensure coherence of caches (that are everywhere in all modern CPUs, but as well on network services like DNS, DHCP, RDP, and on intermediate routers that generally use simplist cache-eviction policies without strong separation between the many clients operating in parallell on the same shared resources, but with different security realms: this is a very hard problem now for cloud providers that want to scale their platform to support more clients at lower prices)...

Le sam. 23 nov. 2019 à 21:21, Mike <tankf33der@disroot.org> a écrit :
November 22, 2019 3:59 PM, "Roberto Ierusalimschy" <roberto@inf.puc-rio.br> wrote:

>> MemorySanitizer: use-of-uninitialized-value /home/mpech/lua-5.3.5/src/liolib.c:490:58 in read_line
>>
>> Here is the line in question:
>>
>> while (i < LUAL_BUFFERSIZE && (c = l_getc(f)) != EOF && c != '\n')
>>
>> The tool seems to think that c is uninitialized, which is clearly
>> wrong given this line just before the loop:
>>
>> int c = '\0';
>>
>> What am I missing?
>
> Might it be some problem inside macro 'l_getc' (which can be either
> getc or getc_unlocked)?
>

0. My linux distro to play - void and arch, latest glibc 2.30, clang-llvm 9.0
(my distros and platforms park is big, I even have owl linux installed)

1.
create data file mike.txt (two lines):
abc mike
xzy

2.
create test code mike.lua (two lines):
io.input("mike.txt")
print(io.read(1, "l") -- problem here

3.
recompile lua under memory sanitizer, my CC line in Makefile:
CC= clang -g -fsanitize=memory -std=gnu99

4.
any combinations of io.read() below is OK and dont trigger fatal warning of sanitizer:
(1, 1)
(1, 128)
("l", "l") -- L and l are the same meaning here
("l", 1)
("l", "a")
(1, "a")
("a", "a")

5. so problem in io.read(1, "l") combination

6. ok

7. after read_chars() read_line() gets correct FILE stream and cursor position inside g_read()

8. sanitizer quits *immediately* in first touch of l_getc(), no while() loop occurs:
while (i < LUAL_BUFFERSIZE && (c = l_getc(f)) != EOF && c != '\n') {

9. if i replace l_getc (getc_unlocked) to simple getc *NO* error

10. looks like false alarm caused by combination of LLVM and glibc.

(mike)