|
On 26 Apr 2007, at 13:35, David Kastrup wrote:
It may also be considered somewhat counterintuitive that the call unicode.utf8.byte(unicode.utf8.char(5000)) returns 5000, something which naive people like myself would not exactly choose to call a "byte".
string.char and string.byte are inverses, and it seems sensible to extend this inverse into the unicode.utf8 domain.
When I implemented Lua in Java, strings were implemented using java.lang.String (so using Java's 16-bit unsigned char type). I took a similar position, string.byte returned an integer between 0 and 65535.
string.byte should probably be named string.code to avoid any emotional attachment to byte.
Whilst almost all bytes are 8-bit (octets), byte does have other meanings apart from 8-bit number. Google for "14-bit byte", etc.
David Jones