lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Note that I'm experimenting on a new alternate mechanism for tables, based on B-trees with fixed page sizes (page wmay contain up to 8 nodes for ranges of integer keys, or 8 nodes for strings. The B tree will be natively ordered by basic type (integer, other number, string, tables, ...) and then each type ordered depending on the type: integers and numbered are ordered according to their value (for NAN numbers, they sort after all all other values, -infinite and +infinite are sorted correctly; I wonder if I need to sort denormals, i.e. near zero values, and minus 0.0, if its distinguished from integer 0==+0.0; for strings they are only ordered by their hash, for tables they are sorted from a sequence number which is kept even if table references are moved in memory).
Also this is going with a new way to manage allocation of objects, in a datastore using also pages... It would look a bit like the MFT in an NTFS volume, and with extended attributes to store extensions. all objects will be movable in that datastore where there will be various page sizes, allocated from pools of small pages, allocated in pools of larger pages. Only very large objects above some threshold would be allocated outside the page space using normal malloc()/free().
There would be multiple B-trees in the same datastore. All memory pools in the datastore would also be moveable without breaking references (references would be inodes in the MFT-like structure, which is also a B-tree). The datastore would also contain pages not organized as b-trees but as simple vectors, to be used for storage of large objects whose excessive physical splitting could create performance issues on data locality.
But the first step is to change radically how tables are represented. It is clearly not optimal and has severe worst cases and it wastes a lot of memory. We can optimize this storage space a lot while also improving the datalocality for performance and without sacrificing the generality of the language. The garbage collector would still work, but it could perform additional operations on the datastore, in order to group used pages together and reduce the footprint when much data has been freed.
B-trees have the most interesting properties in terms of worst case access time, and they can still be very fast if page sizes are not too small. For Lua or similar languages like PHP or _javascript_ it would be very interesting both for small devices with limited memory (because B-trees can reduce the fragmentation of free space), but also on server apps with many clients and restricted quotas per client. This datastore in memory would also backable by external storage on disk, via common paging mechanism and some caching strategy (notably the cache eviction policy would not necessarily be LRU, it would support balanced quotas per security realm and priorities for more privileged clients, that other vlients would not easily evict completely, allowing a server to offer strong resistance to DoS attacks by some malicious/abusive client trying top spy privileged data used by other clients or the system).

B-trees would completely replace hashed tables, but not hashing functions for keys that they store. It would also support sparse tables with arbitrary number of dimensions (i.e. "multikeys" assocated to values, including multikeys for 2D/3D data like geographic data, whose coordinates are hashed in "quads" by interleaving bits of multiple keys).

I've made some tests on part of these concepts, but not for Lua. For Lua the immediate need is to propose a replacement for its tables which are severely broken and easily DoS-attackable. conventional hashed tables using a single linear array of nodes would disappeared completely, replaced by a self-organized tree with fixed-size pages with good enough degree (from my early tests, it seems that tree pages for tables with a degree of 8 (fast 32-bit systems) or 16 (64-bit systems with gigabytes allowed for the VM) is the way to go, but for smaller devices a degree of 4 could save some memory with a modest performance cost (negligeable if the small device is multicore but only limited by its internal RAM and a slow external flashable or harddisk store or a slow network for network storage).

An NTFS-like datastore structure but without all the garbage... even if it could be used as well to implement a filesystem with journaling, logs, snapshots, directories, extended attributes, hardlinks and symbolic links (this could even be tested today on Windows by trying to use existing NTFS drivers to create a volume in memory, and then using classic I/O with caches or a memory mapped file on that volume to handle quotas, security, cache eviction policies and resistance to DoS attacks, however a native NTFS driver makes too many things, and we cannot tune the page sizes as we want: 1KB pages only for the MFT, 4KB or 1MB for the rest of the volume, complex management of the allocation bitmaps).

But a part of that would be very useful for resistant VMs and scalability on various platform types.

I know that Microsoft has tried to do that for SqlServer and wanted to make it a feature of NTFS itself, which would have also supported natively the windows registry, the windows event logs and other extensible datastores; for now this developmeent was stalled since years, as it faces compatibility problems with lot of legacy applications for its OSes. And it is strongly related also to the memory manager (which has tricky relations with other hardware device drivers with strong requirements like audio/video interface, codecs, 3D engines, fast network or bus interfaces like USB3, Ethernet and Firewire, plus some hardware limitations with the supported CPU types, so it was not mande for the core OS instalaltion, but only for additional external storages not needed for booting the system, except via an hypervisor virtualizing all I/O and memory accesses).
I won't try to emulate NTFS, this is too much work and there are lot of details I want to invest time on, with the risk of breaking existing patents. In fact this can only ispire some concepts that were in fact used elsewhere since long before Microsoft implemented its own solution. And all the needs for building a true filesystem are still not there (and I could not do that alone).

But a B-tree approach for tables is possible by a single developer. B-trees are well documented since long. Used in many places and in all SQL engines. SQL engines also implement now "hashstores" for RDF-like data and to collect big data with very weak structures not easily representable with a hierarchical or relational model and that require working on very large sets of items with basic operations like union/intersection/difference, and where an hashtable is not enough to represent and use these collections efficiently: they need sparse matrix with arbitrary number of dimensions, efficient sparse bitsets, and sorting the sets stored in arbitrary is a separate problem, meaning that a B-tree is not necessarily the best solution to organize all indexing keys of large sparse sets (keys are actually then part of the data which may have some level of redundancy allowed and keys are not necessarily unique like in associative tables).