[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: upcoming changes in Lua 5.2 [was Re: Location of a package]
- From: Mike Pall <mikelu-0802@...>
- Date: Tue, 19 Feb 2008 15:36:58 +0100
Roberto Ierusalimschy wrote:
> - ephemeron tables (tables with weak keys only visit value when
> key is accessible)
Would this allow for weak string keys?
> - tables and strings respect __len metamethod
This makes sense for tables, sure.
But I'm strictly against __len for strings. This has a negative
impact on software composability.
The standard behaviour is to return the number of bytes in the
string object. This is well defined and easily described
behaviour and generally relied upon by code using the # operator
on strings (this is NOT just text processing). It's consistent
with string.len() and lua_tolstring(). In fact, the number of
bytes is the only sensible generic definition for it.
If one overrides __len for strings, this will impact _all_
modules, no matter what they are using # for. Ok, so you want to
get the length in UTF-8 codepoints (or glyphs or whatever). Then
by all means use utf8.len() or glyph.len() and don't override #.
Overriding the behaviour of # means that another module, trying
to load an image file from disk and doing some operations on it,
may fail. If you go one step further, you'll realize you have to
change string.sub() and lots of other string.* functions to be
consistent with #. This will in turn break more and more modules.
This is not the way to go -- simple rule: put extra functionality
for a certain type which just happens to be _represented_ by
strings into an extra module.
A string is an opaque container of 8-bit quantities. The Lua core
should never deal with it as if it was the representation of
anything other than that (e.g. ASCII, UTF-8, UTF-32 or whatever).
And it should not encourage anyone to change this basic assumption.
[Maybe you've followed the discussions about JS1/ES3, charAt,
UTF-16 and the backwards-compatibility lockup. Or the story about
Py3K and Unicode. For me, these are all big warning signs that
you do NOT want to mess up the basic language definition with
reliance upon individual character representations. This belongs
into libraries where compatibility issues can be dealt with much
more easily.]
> - arguments for function called through xpcall
I.e. xpcall(f, err, args...) ?
> We are also considering the following changes:
>
> - string.pack/string.unpack (along the lines of struct/lpack)
Sure, this would be very useful.
One thing to consider, is the heritage of these structure
definitions: they either come from C struct definitions and then
you'll want the host-specific type sizes and endianess. Or they
come from some network protocol definition or file format. Then
you'll want to be able to specify the sizes and endianess
independently of the host.
A structure definition syntax needs to cater for both needs. So
far, all attempts at this in other languages have grown ugly and
inconsistent because this need was not anticipated in the design.
There's also the problem with variable-length elements where you
may need to diverge between the specs for pack and unpack.
> - Mike Pall's implementation for yield (using longjmp), allowing yields
> in several places not allowed currently (inside pcall, metamethods, etc.)
The current lua_yield() in LuaJIT 2.x actually never returns to
the caller. It unwinds the C stack back to the last resume (i.e.
a longjmp) and exits there with LUA_YIELD. So far, I've not
noticed any bad side-effects on existing modules. None of them
seem to rely on lua_yield() actually returning to the caller
before passing the -1 return value to the Lua core.
> - some form of bit operations. (We are not very happy with any
> known implementation. Maybe just incorporate bitlib?)
I've used the same names for the bit.* functions, but not the
implementation. I've also opted to put the module into
package.preload and not pollute the globals with "bit" (which
seems to be a popular variable name). local bit = require("bit")
is needed before use.
Note that most implementations out there are broken in some
respect. While it's easy to get this right when lua_Number is an
integer, there are some pitfalls with doubles. You want to allow
both signed and unsigned 32 bit numbers as valid inputs and
produce a consistent format on output (I've opted to signed, but
may revise this decision later, based on user feedback).
0xffffffff either parses as 4294967295 (with lua_Number = double)
or as -1 (with lua_Number = int). Conversely you'll want
bit.band(0xffffffff, -1) to return either -1 or 0xffffffff, but
not an error or any other value (some implementations return
0x80000000 :-) ).
The conversions to and from double are tricky to get right. I'm
always using the d+6755399441055744.0 cast. It yields correct
results for all numbers in the range -2147483648 .. +4294967295
(look twice) and it's very fast. It needs to know the endianess
of the host at compile time (not much of an issue) but is
otherwise completely portable across IEEE 754 implementations.
[And you really want to avoid going through 64 bit integers as
intermediates or (worse yet) doing FP modulos (*argh*).]
> - there is already a new function luaL_tolstring (along the lines of
> the 'tostring' function). Maybe we should define a lua_rawtostring (no
> coercions from numbers) and then use luaL_tolstring ("full" coercion
> from other types) when we want to allow coercions. The point is where
> to use one and where to use the other. (The current lua_tostring behavior
> would be deprecated in the future...)
Independent of this change, I'd welcome it if there were _less_
automatic coercions going on in the standard libraries. I'd ditch
all of the string-to-number auto-coercions once and for all
(ditto for arithmetic operators).
[OTOH the number-to-string auto-coercions make sense in many
cases, e.g. io.write.]
--Mike