Getting a new library into the lua core is unlikely, but could happen.
A very basic support for UTF-8, in the lines suggested by Miles Bader,
seems a good start. Something more or less like this:
utf8.len(s, [l]) -> number of code points in s up to 'l'-th byte (or nil
if s is not properly formed)
utf8.byteoffset(s, l) -> offset (in bytes) where 'l'-th code point
starts
utf8.frontier(s, l) -> offset (in bytes) where code point containing
l-th byte starts (ends?)
utf8.codepoint(s, i, j) -> code points in s from *byte* offset i to j
(default i=1, j=i); i adjusts backward and j adjusts forward until a
proper frontier. (It might be useful another function to return a table
with those code points; {utf8.codepoint(s, 1, -1)} may be too heavy.)
utf8.char(cp1, cp2, ...) -> string formed by code points cp1, cp2, ...
(If cp1 is a table, string formed by the code points in it?)