lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I just read Roberto's slides from the 2012 Lua workshop, and I have a
suggestion for the UTF-8 library.

It is efficient, and often practical, to deal with byte indices, even
in Unicode strings. It is the approach taken by Julia, and I use it in
LuLPeg. The API is simple:

    char, next_pos = getchar(subject, position)

    S = "∂ƒ"
    getchar(S, 1) --> '∂', 4
    getchar(S, 4) --> 'ƒ', 6
    getchar(S, 6) --> nil, nil

A similar function could return code points instead of strings.

What do you think about this?

-- Pierre-Yves

[0] http://www.lua.org/wshop12/Ierusalimschy.pdf