lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Mon, Dec 20, 2010 at 11:21,  <jgiors@threeeyessoftware.com> wrote:
>> ------------------------------
>> Date: Mon, 20 Dec 2010 15:58:49 +0200
>> From: Dirk Laurie <dpl@sun.ac.za>
>> Subject: Re: String indexing again
>> To: Lua mailing list <lua-l@lists.lua.org>
>> Message-ID: <20101220135849.GA7839@rondloper>
>> Content-Type: text/plain; charset=us-ascii
>>
>> On Mon, Dec 20, 2010 at 02:35:09PM +0200, Randy Kramer wrote:
>> > For those of us (maybe just me) just starting to watch from the peanut
>> > gallery, without the long lecture, can you tell me what s[3] does
>> > indicate--I mean what does the s function (is it a function?) do?
>> >
>> > On Sunday 19 December 2010 11:57:19 pm Dirk Laurie wrote:
>> > > Except for the long lecture you need to explain why you get this:
>> > > > s='hello'; print(s[3])
>> > >
>> > > nil
>>
>> Indexing for strings was not defined at all in Lua 5.0:
>> --
>> Lua 5.0.3  Copyright (C) 1994-2006 Tecgraf, PUC-Rio
>> > s='hello'; print(s[3])
>> stdin:1: attempt to index global `s' (a string value)
>> stack traceback:
>>     stdin:1: in main chunk
>>     [C]: ?
>> --
>> It is still not defined in Lua 5.1 and 5.2.  But in the meantime,
>> strings have acquired a metatable, in order to allow you to write
>> s:sub(3,3) instead of string.sub(s,3,3).  Since indexing for
>> strings is not defined, s[3] falls through to the metatable,
>> which does not have an entry with key value 3.  Therefore, as
>> with all table references, nil is returned.
>
> This is a good point. The behavior could be modified by changing __index
> to a function which generates an error when the index is a number (and
> otherwise falls back to the current behavior), but I'm not certain it
> would be worthwhile.
>
>> In my opinion, this sort of non-intuitive behaviour, which needs
>> very careful reading of the reference manual to understand, is
>> a much greater evil than the alleged illogic and Cobol-likeness
>> of making s[3] mean the third character of the string s.  But
>> I promised to stop ranting about this.
>>
>> Dirk
>
> I think an argument against indexing of characters in a string is that
> it implies indexing is acceptable on the left-hand side of an
> assignment:
>
> s = "ABxDEF"
> s[3] = "C"    -- Oh-oh...
>
> If I am not mistaken, this cannot work in standard Lua (i.e. there is no
> way to make s contain "ABCDEF" after the assignment). The failure of
> assigning to a character in a string would probably be just as confusing
> as what you've mentioned above, especially if reading a character from a
> string (with an array index) is allowed.
>
> John Giors
> Independent Programmer
> Three Eyes Software
> jgiors@ThreeEyesSoftware.com
> http://ThreeEyesSoftware.com
>
>
>
>

That could be done with __newindex. It might also allow s[3] = 65
(assigning by character value instead of string).

Regarding the question of whether s[n] should return a substring ('A')
or a byte value (65), consider what happens if you want the opposite:

--s[n] returns string, you want value:
val = s[n]:byte()
--s[n] returns value, you want string
str = string.char(s[n])

The former case seems more convenient and cleaner; returning a string
lets you add on additional string operations without having to wrap
the expression in additional parentheses and specify 'string' again.

As for character encoding, that's another whole can of worms. IIRC,
PHP made the mistake of converting all strings to some particular
encoding - which meant they weren't binary-safe anymore. In Lua as
well sometimes a string will not contain text, but just function as an
array of bytes read from/written to a binary file. (This is perhaps
another argument for allowing indexing?) If strings undergo any kind
of automatic text-based transformation, binary I/O becomes much more
difficult.

Also maybe worth considering: Should files allow numeric indexing,
treating them as giant on-disk byte arrays? I usually wrap mine in a
metatable that does just that; e.g. f[3] = 3rd byte of file.

-- 
Sent from my toaster.