Indexed String Lib for very restricted RAM applications (<30-60 kB RAM)

lua-l archive
[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]
Subject: Indexed String Lib for very restricted RAM applications (<30-60 kB RAM)
From: Flyer31 Test <flyer31@...>
Date: Sun, 7 Nov 2021 17:28:59 +0100
As I described also in some previous posts in the last weeks, I am working on
implementing Lua for an IoT environment based on STM32G431 family with 512kB
ROM and 128kB RAM, nicely available in tiny QFN48 housing.

The dynamice string handling of LUA is too heavy for such a RAM restricted
application as it looks like. I ran into severe problems, as soon as the garbage
collector arrived the second time - I tried several GC settings. So I killed
anything in my software "prone to produce strings" during runtime.

The main thing here is the string library ... allone a concat (str1..str2)
of some strings, or any invocation of string.sub, or especially string.format
would start "really allocating" memory heavily, also if used with very small
strings... .

Typically after startup, my memory allocator will show a max alloc pointer
at about 15kB RAM, but as soon as there has been some string traffic it will
increase fast to 25kB and then "increasing joyfully" while any string handling
is done... .

... and the limit of my heap is set to 60kB (I could set it a bit higher, my
Controller has 128kB RAM, but going to "RAM limit" is always a bad idea in
controller programming I think - typically I would NOT use dynamic
memory allocation at all for such controller software - but I accepted
that for Lua it is important and too useful somehow...).

My first "target example" is a small "Lua snippet" which re-flashes the
controller for exchange of a new Lua text file... . For this programming a Lua
text file of typically 10-30kB size needs to be cut down into 8byte pieces
(this is the programming snippet size for STM32G431 flash programming...).
Doing this with string handling and string library was impossible with this
restricted RAM - it ran into memory errors typically after my first
1kB programming events... (typically when the GC arrived 2 times...).
I first thought about an error in my Lua program or in Lua, but on PC
under Windows the same lua software will run perfectly nice for very
large text files without any problems... .

So I decided that I need a sort of "indexed quasi-static strings" in
my Lua. I did this with
metafile construction similar as described for this "Bit array" example in
"Programming in Lua" book of Roberto (thank you for this nice and
clear description there...). Of course I looked also at the code of
lstrlib.c of Lua source code which also proved very instructive.

A main feature of Lua when you come from C (as if you arrive in any "interpreter
language realm") is the skipping of pointers. Pointers are the hell and heaven
of C string handling, always depending strongly on the point of view and the
application... . And it is clear that in any "easy-to-used language" pointers
should NOT be used (as this is generally fine solved in Lua...).

But to allow a somehow "restricted but helpful pointer replacement" in case
of String cutting and stripping, I insisted on using a string "byte count index"
for all my string buffer functions - for this I could fortunately use nicely
the [..] table index writing style of Lua. Just for such str cutting/stripping,
it is often very useful to refer not only to one index in a string, but to
a "substring range" in a string. To allow this I slightly mishandled the concept
of using integers and float numbers - please forgive me this somehow "terrific
invention" to use the float number 5.004 to depict a char range starting at
character 5 with length of 4 bytes... . So Buf[5.004] will give this substring.
The cute feature of string.sub to use negative indices to index a string from
the end I kept. So Buf[-1.005] will be the last 5 bytes in Buf. I
limited the max
buf size to 250 bytes, this should be fine for any controller app I
can think of,
as text lines can be really restricted to 80 chars without any problem.

Quite similar to the Lua string library, I defined the following functions:
sb.format, sb.pack, sb.unpack

In metastyle, I support the following Lua operators for my sb class:
__newindex for assignments (e. g. Buf[1.005]=..., or Buf[]= ...),
__index for references (e. g. Buf[1.005], or Buf[-1.008]),
__concat for Buf..str, or str..Buf or Buf1..Buf2,
__mul for duplication of strings/ char sequences (also byte inversion),
__len to get the length, #Buf,
__tostring to convert a buffer to a string (but also sb.string for this),
__eq/__lt/__le for the obviously necessary comparison things... .

And I needed/wanted these 5 additional functions:
sb.new (of course necessary to define a new sb element, specifying the buffer
        size, limited to 250 bytes ).
sb.string (to get the Lua string representation of a buffer or buffer index
         segment, e.g. sb.string(Buf) or Buf:string() or sb.string(Buf[1.004]))
sb.scan (anti-function to sb.format - so converts a string back to variables,
         using "if possible" the identical format string).
sb.b (to find the byte range also for substring search, or to convert utf8
      char index ranges to byte index numbers).
sb.u (to finde the utf8 char range for substring search, or to convert byte
      ranges to utf8 index spec numbers).

With these quite restricted range of function, this all works very nicely and
perfectly thread-safe, including really FULL UTF8 support (also supporting
corrupted UTF8 strings, or UTF8 strings with "strange encoding").

My lib is with 13kByte about same size as lstrlib.c + lutflib.c which add
to 13.5kB for my Lua32 STM32 Keil ARMCC, Opt level 1). (my sb lib is programmed
very greedy style .. typicaly "small controller programming style", very much
concentrated to avoid ANY dupplicate code usage...). (my lib does NOT support
search functionality with patterns / SQL style etc...).

But it has some nice advantages compared to the "standard Lua string handling":
- the indiced string subsegment handling should be much faster and more
  efficient than the "alloc substring handling of Lua".
- my Lua "max alloc pointer" now keeps nicely at max. 15kB alloc RAM, also if I
  handle 100kB text files.
- format has some nice new features:
  for floats I introduced 2 new specifiers
  [..] to exchange the decimal point (e. g. %[,]f gives floats with commas
       as required for French+German (and unfortunatly also CSV files in France
   and Germany)).
  {..} to allow number grouping, e. g. %{3'}f will write 1000000 as 1'000'000
  for floats I introduced a new specifier %m which then uses the technical
    standard suffixes "milli, kilo, ..." (a,f,p,n,u,m, ,k,M,G,T,P), further %M
similar but using UTF8 char micro istead of u.
  for floats I introduced a new specifier %r for percent numbers
    (%.2r will write 0.00534 as '0.53%' - %R the same with UTF8 permille char).
  for floats I introduced a new specifier %t for seconds (typically day-seconds)
    (%.2t will write 3662.53 in iso style '01:01:02.53'). (float years
with 'decimal
   year days' can be nicely presented with year-days format '%[YD].3f', e. g.
   Nov 7, 2021 would be day number 311, and 2021.311 then would show
   2021YD311 - using such year-day numbers and day-second-float somehow a
   nice "calender function" replacement for "restricted calender"
functionalities
   in restricted controllers, it avoids the 2038/2106 problem of 32bit
time ints,
   it allows to count the day seconds with msec resolution without problems in a
   float32, and the only calender functionality needed in the software
then is to
   check at counting end of day 365, whether a year was a leap year, and this is
   a really very easy check).
  for floats (and others) the '*' in %*.*f is supported like in c printf.
  for integers/chars I use %c for byte code, %C for utf8 code, %U for Unicode.
  for integers I introduced a new specifier %b for 1-4bit packages:
    %8b to show 8 bits of i as bits 0/1, e. g. 0x45 as 01000101
            (here also e. g. {4'} can be used, to get 0100'0101,
             or e. g. {1,} can be used, to get 0,1,0,0,0,1,0,1)
          (also e. g. [_1], to get _1___1_1, or [_,01234567] to get _6___2_0)
           %8.2b to show 16 bits of i as 2-bit-parts 0/1/2/3, e. g.
0x45 as 000001012
  for csv "variables" I introduced a new specifier %v: This automatically
    converts strings, integers and float numbers into the coding required for
    CSV files (%5v e. g. converts 5 variables into a csv string
segment, including
    commas as list delimiters (%[,]5v does the same with French/German CSV
    format semicolon and comma for float number dec point), in strings the "
    char is automaticaly doubled as required by CSV) (%V does the same in "lazy
    style" concerning string " encapsulation as defined by MS-Excel
CSV outputs).
- having the sb.scan function as "anti-function" to format often is very nice
  useful, e. g. when handling CSV file lines or so ... such CSV file lines
  are e. g. very useful to present any report data to any "higher system", or
  also for configuration files.
- my sb.b/ sb.u functions for string search support some nice options, as you
  also expect them for a typical more advanced editor search functionality.
  (not use of "reg-search-chars", but word search, ignore case, replace, replace
  all, count, also "approximate search" or "number range search" and optional
  support of search chars "*" and "?").
- sb.pack/unpack are a bit stripped down against lua pack/unpack,
  but from phyton I also took the "q" specifier for i8, and I support float
  f1/f2/f4/f8 (so also 1byte minifloat and 2byte short floats, quite often for
  configuration parameters or "rough number displays" like for min/max in
  statistics data such "short floats" are VERY useful...).


... sorry for this very lenghty "report only confession" post... without any
question so far, really worked all very nicely and I am very happy that I now
hopefully can live well with my very restricted 20kB heap RAM for Lua... .
(and max 60kB then seems then clearly fine, also for any "larger" text file
handling  applications...). I will report any problems in latest 1 week I hope.
Maybe you have some nasty or not so nasty comments concerning this approach
until then... (you are welcome in any case...).

For thinking about future "Lua language enhancements" I would have the following
TWO proposals/ "wishes by heart" after all this:
- it generally would be be VERY nice if Lua would encode short strings (max. 4
  chars of course in Lua32, max. 8 chars in Lua64) with NO string alloc, somehow
  similar to ints. This for MANY things would safe lots of alloc, I am
very sure.
  As in such an interpreter language without pre-compiler, the use of defines
  for numbers is clearly impossible, typically for all options in any
configuration data
  always short chars can be very important... . I think this is very
common in programs
  written for interpreter languages, and also somehow nice for the user... .
  (e. g. to define search options, you would use 'w' for word search,
'r' for replace,
   'a' for all etc etc... - in C you possibly would do this more
compact by using
   pre-defined bit combinations for this... ).
- it would be VERY nice, if Lua would allow a further metastring
function __new or
  similar, if somebody writes Buf='hallo' for my String-Buffer sb parts... .
  Now the user has to write Buf[]= 'hallo', and if by some accident user writes
  Buf= 'hallo', then Buf "nervingly" would be killed and in future be a string,
  this really somehow is a bit dangerous for any "lazy user programming style".
  As such assigning is heavily used in any Lua software, to avoid performance
  restrictions, for me it would be perfectly fine to apply this only to
  LUA_TUSERDATA types... .

For string lib enhancements:
- The new format specifiers %m, %r, %b, %v, %[]f, %{}f, %C, %U, %t as described
  above I think are really VERY useful, also for many many applications... .
  Alone as I assume that many Lua applications will offer some sort of serial
  output for "higher system information", and then some terminal program like
  e. g. TeraTerm will run on such a higher system... . Thus then easy formatting
  of text output is always nice... .
- I am quite sure that in some future you should think to add some sort of
  "enhanced/larger string" library with full utf8 support for search and
  format... .  UTF8 is somehow meanwhile too important to handle it just in
  a "small helper lib"... you can skip it for English language applications,
  but as practically ANY other language uses UTF8 more or less heavily, and
  also all these crazily new nice emoticons are in UTF8, you will not come
  around this in some future... .
Prev by Date: Re: Suggestion : Built-in tuple type , was : packed structures
Next by Date: Re: Suggestion : Built-in tuple type , was : packed structures
Previous by thread: Re: Suggestion : Built-in tuple type , was : packed structures
Next by thread: Wrong activelines In Lua5.4
Index(es):
- Date
- Thread