lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Mon, Dec 20, 2010 at 06:02:31PM +0200, Greg Falcon wrote:
> Your point about multibyte characters is well taken, but:
> 
> On Sun, Dec 19, 2010 at 5:32 PM, Tony Finch <dot@dotat.at> wrote:
> > On 19 Dec 2010, at 22:19, Greg Falcon <veloso@verylowsodium.com> wrote:
> >>
> >> A subtle point here:  This snippet from the manual is talking about
> >> the *character* at s[1], and Lua doesn't have a character type.
> >
> > It says character but it means octet.
> 
> It probably means character in the C "char" sense.  "Octet" is not an
> appropriate word to use for this concept in portable C programs, since
> chars/bytes in standard C are allowed to be wider than 8 bits in
> standard-conforming implementations.
> 

Definition:

    A string is a Lua value consisting of a sequence of bytes but 
    having no other structure, mainly used to represent other values 
    in a human-readable way.

In particular, a string is not a table, therefore also not an array,
and its entries are bytes, not characters or anything else.  (Although
one tacitly assumes that there exist useful mappings between strings and
sequences of characters, e.g. between the four-character sequence "\\" 
and the one-byte string consisting of the byte encoding of a backslash.) 

The k-th byte of a string is just that, a byte.  The notion of "the k-th 
character of a string" is useful in text processing applications, but 
it is not a Lua notion.  Lua does not have a type "character".  It is 
therefore quite impossible to make s[k] mean "the k-th character of s".

One could, though, make s[k] mean "a one-byte string consisting of the 
k-th byte of s", i.e.
    s[k] == string.char(string.byte(s,k,k))
Then it is obvious that "s[3]='c'" is nonsense, since
    string.char(string.byte(s,3,3)) = 'c'
is nonsense.  

Dirk