lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


One drawback, each time we use caching technics, is that we can get more exposure to time-based side-channel attacks.
I discussed recently with another use the possibility of using Lua's collision lists in its keyed tables if we did not care about using a striong enough hashing function that correctly flattens the distribution of use of hash buckets.

For short strings, this is good enough because all texts will still be hashed. But with long texts, hashing can become significantly attackable.
So a good approch would be to hash only the beginning of long texts and then make sure that the resulting distribution in hash buckets is still within acceptable limits (if not we can select another short segment in the text: the start of these fragments would be determined by the total string length, based on modular arithmetics which allows creating an easy reproducible sequence of start indexes, for the same length, from which the hash is computed).

And to protect from time-based attacks, of course the hashing function should be changed regularly (forcing keys to be rehashed, but as they would affect the whole table, of course reindexing all the long text would have a huge impact, that's why indexing only a fragment would be very useful).

Beside that, I don't think we should cache the full comparison of texts when performing lookups in the hashtable, as here also we can use the reproductible pseudo-random sequence of start indexes to compare only fragments (for fast elimination of non-matching keys) until the whole text has been compared (in a pseudo-random order of fragments).

Note that this applies independantly of the choice of the hashing function (it could be simple like Knuth's modular hashing using a multiplication by large enough prime, or cryptographically string like SHA1 or even SHA2 but with lower performance): even when using a cryptographically string hashing, the performance effect of long texts would be large enough to be measurable and attackable.

As a general rule however, secure data should not have very variable sizes that are identifiable in time-based attacks. So hashhing short items like user names, file names or passwords/passphrases should not be impacted by this risk as they should have almost all the same length (and because they are short, they can be easily padded before hashing them, so that they will all need roughly the same time to be hashed).

Hashing large contents however should be performed "offline" (e.g. when hashing file contents) before there's any attempt to exchange that content or fragments of that content, and before allowing others to detect that you access a specific largefile (which wil lthen only be known externally by its identifier: the precomputed hash value if the file content is readonly, or its short "filename", independantly of its content size. This will still allow using Lua's tables to index not the file contents themselves, but their short identifier (and a 512-bit identifier is long enough, if it is generated by a cryptographic hash function like SHA2, and that does not need any additional complex rehashing in Lua tables due to its existing properties).

If Lua introduces a "longstring" type, it MUST not be usable as a valid type for keys in table: you'll have to use alternate keys (the unique identifier of its content or URL/path generated externally). Lua should not even force using them in memory, they will be viewed just like externally stored "blobs", accessed by some I/O API when needed, but not kept permanently in memory. As well the caches used by this I/O API should be carefully designed to be immune to time-based side-channel attacks.


Le lun. 17 juin 2019 à 16:19, 云风 <cloudwu@gmail.com> a écrit :


>
> I disagree that the parser stage is going to be the main source of string objects in general. Almost any program that has to read data from a file is going to use a lot of small strings. (It is, of course, possible to do it with a single long string and then only process it using numeric data types, but that technique only makes sense for packed binary data.)
>

A cache in GC can merge most of the same strings. When we mark a string to black , put it into the cache or merge to the previous one.

The cache can be fixed size, so it can immune the hash dos attack .