lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Wed, Feb 11, 2015 at 2:29 PM, Sean Conner <sean@conman.org> wrote:
>         1. convert UTF-8 sequence to a Unicode codepoint
>            (http://en.wikipedia.org/wiki/UTF-8)

If using Lua 5.3, the utf8.codes iterator can be used to iterate over
the Unicode codepoints in the UTF-8 string.

>         2. Convert the Unicode codepoint
>         (http://www.unicode.org/Public/UCD/latest/charts/CodeCharts.pdf WARNING:
>         LARGE PDF) to ISO-8859-1 codepoint
>         (http://en.wikipedia.org/wiki/ISO/IEC_8859-1)

This is probably easiest to do by creating a table matching Unicode
codepoints (keys) to ISO-8859-1 encoded characters (values, using the
decimal escapes /032 to /126 and /160 to /255 to ensure proper
encoding). Then for each code produced by utf8.codes, just look it up
in the table. If the key is not present (i.e. there is no ISO-8859-1
equivalent), then you have a string that cannot be converted.