lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On 2015-02-11 20:29 Sean Conner wrote:
It was thus said that the Great Igor Medeiros once stated:
Dear contributors,

Is there a way to covert a string whose characters are encoded in utf8, to
a string with characters encoded in iso-8859-1, just using lua standard
libs? I cannot use libs with C codes.

If there is, could you tell me how to do that or even point some site with
this information?
   I don't know of any existing Lua code to do this, but the concept is
straightforward:

	1. convert UTF-8 sequence to a Unicode codepoint
	   (http://en.wikipedia.org/wiki/UTF-8)

	2. Convert the Unicode codepoint
	(http://www.unicode.org/Public/UCD/latest/charts/CodeCharts.pdf  WARNING:
	LARGE PDF) to ISO-8859-1 codepoint
	(http://en.wikipedia.org/wiki/ISO/IEC_8859-1)

	3. Go back to step 1 if more data.

   -spc (That should be enough to get you going ... )



Actually, the Latin 1 subset of unicode has the same codepoints as Latin1. It's just that UTF-8 is a different encoding. The following suffices to do the conversion with Lua 5.3:

    function utf8_to_latin1(s)
        local r = ''
        for _, c in utf8.codes(s) do
            r = r .. string.char(c)
        end
        return r
    end


HTH, Christian