[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Unicode and UTF-8 the Lua way, mid-discussion (was Re: What do you miss most in Lua)
- From: Roberto Ierusalimschy <roberto@...>
- Date: Thu, 9 Feb 2012 16:37:38 -0200
> Getting lua's core to change its view of strings to being something
> other than a byte-sequence isn't going to happen, its not the lua way,
Sure.
> Getting a new library into the lua core is unlikely, but could happen.
A very basic support for UTF-8, in the lines suggested by Miles Bader,
seems a good start. Something more or less like this:
utf8.len(s, [l]) -> number of code points in s up to 'l'-th byte (or nil
if s is not properly formed)
utf8.byteoffset(s, l) -> offset (in bytes) where 'l'-th code point
starts
utf8.frontier(s, l) -> offset (in bytes) where code point containing
l-th byte starts (ends?)
utf8.codepoint(s, i, j) -> code points in s from *byte* offset i to j
(default i=1, j=i); i adjusts backward and j adjusts forward until a
proper frontier. (It might be useful another function to return a table
with those code points; {utf8.codepoint(s, 1, -1)} may be too heavy.)
utf8.char(cp1, cp2, ...) -> string formed by code points cp1, cp2, ...
(If cp1 is a table, string formed by the code points in it?)
-- Roberto
- Follow-Ups:
- Re: Unicode and UTF-8 the Lua way, mid-discussion (was Re: What do you miss most in Lua), Duncan Cross
- Re: Unicode and UTF-8 the Lua way, mid-discussion (was Re: What do you miss most in Lua), David Given
- Re: Unicode and UTF-8 the Lua way, mid-discussion (was Re: What do you miss most in Lua), André Naef
- Re: Unicode and UTF-8 the Lua way, mid-discussion (was Re: What do you miss most in Lua), Miles Bader
- Re: Unicode and UTF-8 the Lua way, mid-discussion (was Re: What do you miss most in Lua), Jay Carlson
- Re: Unicode and UTF-8 the Lua way, mid-discussion (was Re: What do you miss most in Lua), Bernd Eggink