lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



Would it be a good idea to make a distinction between characters and
bytes, or do you guys feel that this is already clear in the manual
(and PiL)?

For C programmers, characters and bytes have always been synonyms...
But for programmers used to Unicode aware languages, I admit that Lua denomination is confusing.

I searched for "character" and "byte" in Lua 5.2 reference manual.
There are a lot more of "characters" than "bytes". Most of the time, "character" is used to refer to a literal ASCII character as in 'k'.
I don't think it would help to write for example "the byte 'k' ". instead of "the character 'k' ".

In the string library chapter, a character generally means a byte. Note however that at the start of the chapter there is this sentence:

  "The string library assumes one-byte character encodings. "

Also, for the # operator, the reference states:

  "The length of a string is its number of bytes (that is, the usual meaning of string length when each character is one byte). "

It is however funny to note that the function `void luaL_addchar (luaL_Buffer *B, char c)` is documented as "Adds the byte c to the buffer B".
So yes, there is a place for confusion. But I don't think that `reference_manual:gsub("character", "byte") ` has the correct syntax to fix the situation.