2018-07-11 22:58 GMT+02:00 Gregg Reynolds <
dev@mobileink.com>:
>
>
> On Wed, Jul 11, 2018, 1:43 AM Dirk Laurie <
dirk.laurie@gmail.com> wrote:
> ...
>>
>> >From the point of view of the utf8 library, UTF-8 is a reversible way
>> of mapping a certain subset of strings (which I here call "codons",
>> borrowing a term from DNA theory) onto a certain subset of 32-bit
>> integers.
>
>
> Not even wrong.
https://en.m.wikipedia.org/wiki/Not_even_wrong. Utf8 has
> nothing to do with "a certain subset of 32 bit integers".
My bad. I should have said "Lua integers". The actual sizie depends
on luaconf.h, and 32 bits is not in fact the default.
> If you're talking about utf8, but you're not talking about Unicode, then
> what are you talking about? I'm not against it, I just don't see what you're
> after.
I am talking about utf8, not about UTF-8 certainly not about Unicode.
Definitions:
Unicode: a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. [1]
UTF-8: a variable width character encoding capable of encoding all 1,112,064[1] valid code points in Unicode using one to four 8-bit bytes. [1]
utf8: a library in Lua 5.3 that provides basic support for UTF-8 encoding, but no support for Unicode other than the handling of the encoding. Any operation that needs the meaning of a character, such as character classification, is outside its scope. [2]
I started this thread in order to make the point that certain other functions in the string libray, in addition utf8.len and utf8.char, could also be generalized to the very restricted setting in which the utf8 library operates.