[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Pooling of strings is good
- From: Sean Conner <sean@...>
- Date: Sat, 23 Aug 2014 03:11:20 -0400
It was thus said that the Great Coroutines once stated:
> On Fri, Aug 22, 2014 at 2:13 PM, Sean Conner <sean@conman.org> wrote:
>
> >> I play with networking.
>
> So then you know what a pain it is to not be able to represent a
> simple ring buffer, or provide DMA access to the NIC's ring buffer
> hosted through userdata to Lua -- without having to copy it out and
> promote it to a string?
It doesn't bother me, because the convenience it affords is worth the
price (in my opinion).
> I wish userdata could be used in the same
> place a string is expected. I would love an "invisible" memcmp() if
> it's a userdata-string comparison (==) or to be able to directly use
> the string.*() functions on them.
I swear, the way you are talking, it's as if you equate "userdata" with
"Lua's internal string representation" and it jars me every time. I have
plenty of userdata types that have nothing to do with strings; where
"userdata == string" just does not make any semamtic sense what-so-ever [1].
As an example, some of the datafiles we use at work are quite large [2].
Because of this (and an internal binary structure) I mmap() the files into
memory and wrap a userdata around it, providing a __index method to return
records based on ordinal position or based on a key (the files are
specialized key/value stores) [3][4]. The userdata here represents a huge
amount of data, not just one "thing".
Another userdata is the wrapper for iconv. Heck, the header file,
iconv.h, doesn't even define the structure:
typedef void *iconv_t;
so again, I'm at a loss as to what "userdata == string" even means in that
case.
> Strings are sealed userdata -- if
> you know you need a fixed buffer whose contents are changed quite
> often then I wish userdata could conform to this use. Generally
> duplicating them into a string is okay because we don't handle very
> large data, but lots of frequent reallocation frustrates me. This
> stuff doesn't need to be in Lua's string pool for memory or comparison
> efficiency :p All I really need is read access to it -- and I don't
> want to have to rewrite what's in the string library for userdata.
So really, what you are after is a buffer type userdata that can be used,
invisibly, as an immutable string. You could hack Lua to do that, as a
proof-of-concept, get some feedback, do some benchmarking and possible get
enough users to convince PUC to officially add it.
> > But now you are exposing what has been an implementation detail. The
> > tname just has to be unique, it doesn't have to have any meaning (so I guess
> > the days of #define MY_TYPE "\200\201\202\203\344\377\300" are over) and we
> > run headlong into the namespace problem we (potentially) have with modules,
> > but now with types.
>
> I think -- like module names -- people will have to put some thought
> into their typenames. I would not expect people to pull in libraries
> introducing a lot of userdata types anyway, I feel like it would be a
> small concern.
I use 22 userdata types at work (just counted). It's not a small concern.
> > Other unintended consequences---why did I create a new function typeof()?
> > Why not extend type() to return a second value? Because there might be
> > code that expects type() to only return a single value (calling type()
> > within the definition of an array)?
>
> Disagreement here, I would avoid a typeof() and just add the 2nd
> return to type() -- or have a type() and rawtype(). Doesn't really
> matter ~
If it doesn't matter, why not typeof()?
> > Hmmm ... this is bringing up an interesting problem---do we need to check
> > for mutable strings (buffers?) and immutable strings? What happens to C
> > code that does:
> >
> > const char *name = lua_tostring(L,idx);
>
> I imagine nothing changes, as you are referencing const char.. :p
>
> > char buffer[65536uL];
> > bytes = recvfrom(sock->fh,buffer,sizeof(buffer),0,&remaddr->sa,&remsize);
> > lua_pushlstring(L,buffer,bytes);
>
> The only problem I have with this is while I know of no foolproof way
> to multithread an embedded Lua process, it seems like a bad idea to
> have a "shared" static buffer in a library? :>
I never said it was static. That "buffer" there is an auto---declared on
the stack of the socklua_recv() function [5]. No static buffer here.
And the way I handle multithreaded process with Lua embedded is to give
each operating system thread its own Lua state. Yeah, I could provide the
lua_lock() and lua_unlock() implementations, but I'd rather avoid the loss
of speed that a global interpreter lock imposes [6].
> > Okay, nothing stopping anybody from doing that. But I suspect only a
> > small subset of the standard Lua library could be done in pure Lua.
>
> Mostly I just thought it would be clearer to see how args are accepted
> and parsed and it would be easier to notice issues/bugs. The other
> bonus would be it'd be easier for upstream and the community to
> prototype new functions to add to the string or table or whatever
> library. The standard libraries themselves could be released as less
> of a 'standard' and more of a 'go add to it' example. I think Lua
> would grow more quickly because of it -- and in ways the community
> could regard and come to a consensus on?
You could always try that approach.
[1] That's not to say I don't have a __tostring() method attached to
them---I do. But mostly for debugging purposes, not for any real use.
[2] Relatively speaking. I mean, 100,000,000 records does take up some
space.
[3] In this case, when you do an index reference, you get back a plain
Lua table with the data converted to more natural data types. For
instance, one such file maps phone numbers (the key) to
functionality (this phone number can do A, B and C). When
referenced:
x = mapping['5555551212']
x is a Lua table:
{
number = '5555551212',
A = true,
B = true,
C = false
}
The reason I include the number is because you can also reference it
like:
x = mapping[456]
The number is stored in a compressed binary format (to save space)
and the feature indicators are individual bits.
[4] Why not a real database, or some other key/value store? Database
engines weren't fast enough (we're literally in the "call chain" of
a phone call) and the storage medium is ... interesting in the
Chinese definition of "interesting".
[5] https://github.com/spc476/lua-conmanorg/blob/b7f2414391375c4d872d217fb9b297695a1dac11/src/net.c#L1471
[6] I'm looking at you, Python.