lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I remember vividly how much code broke when Perl (somewhere in the 5.6
- 5.8 range IIRC) switched the defaults from "byte semantics" to
"character semantics". Fortunately, they also provided pragmas so you
could change how a given section of code functioned.

Unicode in general and UTF-8 in particular are quickly becoming
indispensable and Lua programmers need a standardized way of dealing
with them, either in libraries or in extensions to the language syntax
and semantics. Personally I favor libraries since they can be
blazingly fast and don't break existing code. But they do need to be
there and work.

On Fri, Nov 2, 2012 at 8:33 AM, Rapin Patrick <rapin.patrick@gmail.com> wrote:
>
>> Would it be a good idea to make a distinction between characters and
>> bytes, or do you guys feel that this is already clear in the manual
>> (and PiL)?
>
>
> For C programmers, characters and bytes have always been synonyms...
> But for programmers used to Unicode aware languages, I admit that Lua
> denomination is confusing.
>
> I searched for "character" and "byte" in Lua 5.2 reference manual.
> There are a lot more of "characters" than "bytes". Most of the time,
> "character" is used to refer to a literal ASCII character as in 'k'.
> I don't think it would help to write for example "the byte 'k' ". instead of
> "the character 'k' ".
>
> In the string library chapter, a character generally means a byte. Note
> however that at the start of the chapter there is this sentence:
>
>
>   "The string library assumes one-byte character encodings. "
>
> Also, for the # operator, the reference states:
>
>   "The length of a string is its number of bytes (that is, the usual meaning
> of string length when each character is one byte). "
>
> It is however funny to note that the function `void luaL_addchar
> (luaL_Buffer *B, char c)` is documented as "Adds the byte c to the buffer
> B".
> So yes, there is a place for confusion. But I don't think that
> `reference_manual:gsub("character", "byte") ` has the correct syntax to fix
> the situation.



-- 
Twitter: http://twitter.com/znmeb; Computational Journalism Publishers
Workbench: http://znmeb.github.com/Computational-Journalism-Publishers-Workbench/

How the Hell can the lion sleep with all those people singing "A weem
oh way!" at the top of their lungs?