lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]




On Tue, Jul 10, 2018, 5:17 PM Sean Conner <sean@conman.org> wrote:
It was thus said that the Great Gregg Reynolds once stated:
> On Tue, Jul 10, 2018, 4:44 PM Dirk Laurie <dirk.laurie@gmail.com> wrote:
> ...
>
> >
> > I. Am. Not. Asking. For. Unicode.
> >
> > I am merely asking for extra functions along the lines of what the
> > utf8 library already does.
> > E.g. Sam's examples:
> >
> > > s1 = "Hélène"
> > > s2 = "Hélène"

  They look similar, but they are construct differently.

> FYI these look identical on Android.
>
> > > utf8.len(s1)
> > 6
> > > utf8.len(s2)
> > 7
> >
> > If you really not understand what I mean, I can elaborate.
>
> Please do.
>
> What does "len" mean? Number of Unicode chars ot number of bytes?

  The number of Unicode code points.  The second one has a letter 'e'
followed by a combining accent (I'm not sure which accent is the combining
one), thus the different number of Unicode code points.

Ok, we have "codepoints", "chars", bytes, and heaven knows what else. Is a Unicode "codepoint" a byte? No. Is "Unicode codepoint" even meaningful?