[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: [ANN] Lua 5.3.0 (work2) now available
- From: Sean Bolton <sean@...>
- Date: Sun, 23 Mar 2014 10:11:06 -0700
This is shaping up to be a really exciting release! I've been playing
with the utf8 library:
$ lua-5.3.0-work2/src/lua
Lua 5.3.0 (work2) Copyright (C) 1994-2014 Lua.org, PUC-Rio
> utf8.len('')
stdin:1: bad argument #1 to 'len' (initial position out of string)
stack traceback:
[C]: in function 'len'
stdin:1: in main chunk
[C]: in ?
That's certainly a surprise. Is it intentional that this raises an
error, instead of returning 0 for the empty string like string.len?
> utf8.offset('a', 0, 3)
stdin:1: bad argument #3 to 'offset' (position out of range)
stack traceback:
[C]: in function 'offset'
stdin:1: in main chunk
[C]: in ?
> utf8.offset('a', 0, 2)
2
I expect start offsets of 2 and 3 would both raise errors.
There is a typo in the manual for utf8.len: 'sufix' should be 'suffix'.
As an exercise, I tried to write a function using the new utf8 library
that would scan a string and replace any invalid UTF-8 sequences with a
substitution character. I failed to come up with any solution that did
not either create many small strings (one for each codepoint in the
target string) or involve scanning the string byte-by-byte (which could
as easily be done with Lua 5.2). How much easier it would be if
utf8.len, upon encountering invalid UTF-8, returned nil plus the
byte offset of the offending sequence!
-Sean