[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Convert utf8 string to iso8859-1 (Latin1)
- From: Jonathan Goble <jcgoble3@...>
- Date: Wed, 11 Feb 2015 14:51:40 -0500
On Wed, Feb 11, 2015 at 2:29 PM, Sean Conner <sean@conman.org> wrote:
> 1. convert UTF-8 sequence to a Unicode codepoint
> (http://en.wikipedia.org/wiki/UTF-8)
If using Lua 5.3, the utf8.codes iterator can be used to iterate over
the Unicode codepoints in the UTF-8 string.
> 2. Convert the Unicode codepoint
> (http://www.unicode.org/Public/UCD/latest/charts/CodeCharts.pdf WARNING:
> LARGE PDF) to ISO-8859-1 codepoint
> (http://en.wikipedia.org/wiki/ISO/IEC_8859-1)
This is probably easiest to do by creating a table matching Unicode
codepoints (keys) to ISO-8859-1 encoded characters (values, using the
decimal escapes /032 to /126 and /160 to /255 to ensure proper
encoding). Then for each code produced by utf8.codes, just look it up
in the table. If the key is not present (i.e. there is no ISO-8859-1
equivalent), then you have a string that cannot be converted.