[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Indexed String Lib for very restricted RAM applications (<30-60 kB RAM)
- From: Flyer31 Test <flyer31@...>
- Date: Sun, 7 Nov 2021 17:28:59 +0100
As I described also in some previous posts in the last weeks, I am working on
implementing Lua for an IoT environment based on STM32G431 family with 512kB
ROM and 128kB RAM, nicely available in tiny QFN48 housing.
The dynamice string handling of LUA is too heavy for such a RAM restricted
application as it looks like. I ran into severe problems, as soon as the garbage
collector arrived the second time - I tried several GC settings. So I killed
anything in my software "prone to produce strings" during runtime.
The main thing here is the string library ... allone a concat (str1..str2)
of some strings, or any invocation of string.sub, or especially string.format
would start "really allocating" memory heavily, also if used with very small
strings... .
Typically after startup, my memory allocator will show a max alloc pointer
at about 15kB RAM, but as soon as there has been some string traffic it will
increase fast to 25kB and then "increasing joyfully" while any string handling
is done... .
... and the limit of my heap is set to 60kB (I could set it a bit higher, my
Controller has 128kB RAM, but going to "RAM limit" is always a bad idea in
controller programming I think - typically I would NOT use dynamic
memory allocation at all for such controller software - but I accepted
that for Lua it is important and too useful somehow...).
My first "target example" is a small "Lua snippet" which re-flashes the
controller for exchange of a new Lua text file... . For this programming a Lua
text file of typically 10-30kB size needs to be cut down into 8byte pieces
(this is the programming snippet size for STM32G431 flash programming...).
Doing this with string handling and string library was impossible with this
restricted RAM - it ran into memory errors typically after my first
1kB programming events... (typically when the GC arrived 2 times...).
I first thought about an error in my Lua program or in Lua, but on PC
under Windows the same lua software will run perfectly nice for very
large text files without any problems... .
So I decided that I need a sort of "indexed quasi-static strings" in
my Lua. I did this with
metafile construction similar as described for this "Bit array" example in
"Programming in Lua" book of Roberto (thank you for this nice and
clear description there...). Of course I looked also at the code of
lstrlib.c of Lua source code which also proved very instructive.
A main feature of Lua when you come from C (as if you arrive in any "interpreter
language realm") is the skipping of pointers. Pointers are the hell and heaven
of C string handling, always depending strongly on the point of view and the
application... . And it is clear that in any "easy-to-used language" pointers
should NOT be used (as this is generally fine solved in Lua...).
But to allow a somehow "restricted but helpful pointer replacement" in case
of String cutting and stripping, I insisted on using a string "byte count index"
for all my string buffer functions - for this I could fortunately use nicely
the [..] table index writing style of Lua. Just for such str cutting/stripping,
it is often very useful to refer not only to one index in a string, but to
a "substring range" in a string. To allow this I slightly mishandled the concept
of using integers and float numbers - please forgive me this somehow "terrific
invention" to use the float number 5.004 to depict a char range starting at
character 5 with length of 4 bytes... . So Buf[5.004] will give this substring.
The cute feature of string.sub to use negative indices to index a string from
the end I kept. So Buf[-1.005] will be the last 5 bytes in Buf. I
limited the max
buf size to 250 bytes, this should be fine for any controller app I
can think of,
as text lines can be really restricted to 80 chars without any problem.
Quite similar to the Lua string library, I defined the following functions:
sb.format, sb.pack, sb.unpack
In metastyle, I support the following Lua operators for my sb class:
__newindex for assignments (e. g. Buf[1.005]=..., or Buf[]= ...),
__index for references (e. g. Buf[1.005], or Buf[-1.008]),
__concat for Buf..str, or str..Buf or Buf1..Buf2,
__mul for duplication of strings/ char sequences (also byte inversion),
__len to get the length, #Buf,
__tostring to convert a buffer to a string (but also sb.string for this),
__eq/__lt/__le for the obviously necessary comparison things... .
And I needed/wanted these 5 additional functions:
sb.new (of course necessary to define a new sb element, specifying the buffer
size, limited to 250 bytes ).
sb.string (to get the Lua string representation of a buffer or buffer index
segment, e.g. sb.string(Buf) or Buf:string() or sb.string(Buf[1.004]))
sb.scan (anti-function to sb.format - so converts a string back to variables,
using "if possible" the identical format string).
sb.b (to find the byte range also for substring search, or to convert utf8
char index ranges to byte index numbers).
sb.u (to finde the utf8 char range for substring search, or to convert byte
ranges to utf8 index spec numbers).
With these quite restricted range of function, this all works very nicely and
perfectly thread-safe, including really FULL UTF8 support (also supporting
corrupted UTF8 strings, or UTF8 strings with "strange encoding").
My lib is with 13kByte about same size as lstrlib.c + lutflib.c which add
to 13.5kB for my Lua32 STM32 Keil ARMCC, Opt level 1). (my sb lib is programmed
very greedy style .. typicaly "small controller programming style", very much
concentrated to avoid ANY dupplicate code usage...). (my lib does NOT support
search functionality with patterns / SQL style etc...).
But it has some nice advantages compared to the "standard Lua string handling":
- the indiced string subsegment handling should be much faster and more
efficient than the "alloc substring handling of Lua".
- my Lua "max alloc pointer" now keeps nicely at max. 15kB alloc RAM, also if I
handle 100kB text files.
- format has some nice new features:
for floats I introduced 2 new specifiers
[..] to exchange the decimal point (e. g. %[,]f gives floats with commas
as required for French+German (and unfortunatly also CSV files in France
and Germany)).
{..} to allow number grouping, e. g. %{3'}f will write 1000000 as 1'000'000
for floats I introduced a new specifier %m which then uses the technical
standard suffixes "milli, kilo, ..." (a,f,p,n,u,m, ,k,M,G,T,P), further %M
similar but using UTF8 char micro istead of u.
for floats I introduced a new specifier %r for percent numbers
(%.2r will write 0.00534 as '0.53%' - %R the same with UTF8 permille char).
for floats I introduced a new specifier %t for seconds (typically day-seconds)
(%.2t will write 3662.53 in iso style '01:01:02.53'). (float years
with 'decimal
year days' can be nicely presented with year-days format '%[YD].3f', e. g.
Nov 7, 2021 would be day number 311, and 2021.311 then would show
2021YD311 - using such year-day numbers and day-second-float somehow a
nice "calender function" replacement for "restricted calender"
functionalities
in restricted controllers, it avoids the 2038/2106 problem of 32bit
time ints,
it allows to count the day seconds with msec resolution without problems in a
float32, and the only calender functionality needed in the software
then is to
check at counting end of day 365, whether a year was a leap year, and this is
a really very easy check).
for floats (and others) the '*' in %*.*f is supported like in c printf.
for integers/chars I use %c for byte code, %C for utf8 code, %U for Unicode.
for integers I introduced a new specifier %b for 1-4bit packages:
%8b to show 8 bits of i as bits 0/1, e. g. 0x45 as 01000101
(here also e. g. {4'} can be used, to get 0100'0101,
or e. g. {1,} can be used, to get 0,1,0,0,0,1,0,1)
(also e. g. [_1], to get _1___1_1, or [_,01234567] to get _6___2_0)
%8.2b to show 16 bits of i as 2-bit-parts 0/1/2/3, e. g.
0x45 as 000001012
for csv "variables" I introduced a new specifier %v: This automatically
converts strings, integers and float numbers into the coding required for
CSV files (%5v e. g. converts 5 variables into a csv string
segment, including
commas as list delimiters (%[,]5v does the same with French/German CSV
format semicolon and comma for float number dec point), in strings the "
char is automaticaly doubled as required by CSV) (%V does the same in "lazy
style" concerning string " encapsulation as defined by MS-Excel
CSV outputs).
- having the sb.scan function as "anti-function" to format often is very nice
useful, e. g. when handling CSV file lines or so ... such CSV file lines
are e. g. very useful to present any report data to any "higher system", or
also for configuration files.
- my sb.b/ sb.u functions for string search support some nice options, as you
also expect them for a typical more advanced editor search functionality.
(not use of "reg-search-chars", but word search, ignore case, replace, replace
all, count, also "approximate search" or "number range search" and optional
support of search chars "*" and "?").
- sb.pack/unpack are a bit stripped down against lua pack/unpack,
but from phyton I also took the "q" specifier for i8, and I support float
f1/f2/f4/f8 (so also 1byte minifloat and 2byte short floats, quite often for
configuration parameters or "rough number displays" like for min/max in
statistics data such "short floats" are VERY useful...).
... sorry for this very lenghty "report only confession" post... without any
question so far, really worked all very nicely and I am very happy that I now
hopefully can live well with my very restricted 20kB heap RAM for Lua... .
(and max 60kB then seems then clearly fine, also for any "larger" text file
handling applications...). I will report any problems in latest 1 week I hope.
Maybe you have some nasty or not so nasty comments concerning this approach
until then... (you are welcome in any case...).
For thinking about future "Lua language enhancements" I would have the following
TWO proposals/ "wishes by heart" after all this:
- it generally would be be VERY nice if Lua would encode short strings (max. 4
chars of course in Lua32, max. 8 chars in Lua64) with NO string alloc, somehow
similar to ints. This for MANY things would safe lots of alloc, I am
very sure.
As in such an interpreter language without pre-compiler, the use of defines
for numbers is clearly impossible, typically for all options in any
configuration data
always short chars can be very important... . I think this is very
common in programs
written for interpreter languages, and also somehow nice for the user... .
(e. g. to define search options, you would use 'w' for word search,
'r' for replace,
'a' for all etc etc... - in C you possibly would do this more
compact by using
pre-defined bit combinations for this... ).
- it would be VERY nice, if Lua would allow a further metastring
function __new or
similar, if somebody writes Buf='hallo' for my String-Buffer sb parts... .
Now the user has to write Buf[]= 'hallo', and if by some accident user writes
Buf= 'hallo', then Buf "nervingly" would be killed and in future be a string,
this really somehow is a bit dangerous for any "lazy user programming style".
As such assigning is heavily used in any Lua software, to avoid performance
restrictions, for me it would be perfectly fine to apply this only to
LUA_TUSERDATA types... .
For string lib enhancements:
- The new format specifiers %m, %r, %b, %v, %[]f, %{}f, %C, %U, %t as described
above I think are really VERY useful, also for many many applications... .
Alone as I assume that many Lua applications will offer some sort of serial
output for "higher system information", and then some terminal program like
e. g. TeraTerm will run on such a higher system... . Thus then easy formatting
of text output is always nice... .
- I am quite sure that in some future you should think to add some sort of
"enhanced/larger string" library with full utf8 support for search and
format... . UTF8 is somehow meanwhile too important to handle it just in
a "small helper lib"... you can skip it for English language applications,
but as practically ANY other language uses UTF8 more or less heavily, and
also all these crazily new nice emoticons are in UTF8, you will not come
around this in some future... .