lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On 26 Apr 2007, at 13:35, David Kastrup wrote:


It may also be considered somewhat counterintuitive that the call

unicode.utf8.byte(unicode.utf8.char(5000))

returns 5000, something which naive people like myself would not
exactly choose to call a "byte".

string.char and string.byte are inverses, and it seems sensible to extend this inverse into the unicode.utf8 domain.

When I implemented Lua in Java, strings were implemented using java.lang.String (so using Java's 16-bit unsigned char type). I took a similar position, string.byte returned an integer between 0 and 65535.

string.byte should probably be named string.code to avoid any emotional attachment to byte.

Whilst almost all bytes are 8-bit (octets), byte does have other meanings apart from 8-bit number. Google for "14-bit byte", etc.

David Jones