lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


if you only notice size, maybe this version can give you some hint:

local function utf8_sep(n, a, ...)
    if a < 2^(6-n) then return n+1, a, ... end
    return utf8_sep(n+1, math.floor(a/2^6), 0x80+a%2^6, ...)
end
local function utf8_gen(n, a, ...)
    print(n, (2^n-1)*2^(8-n) + a, ...)
    return string.char((2^n-1)*2^(8-n) + a, ...)
end
local function utf8(code)
    if code < 0x80 then return string.char(code) end
    return utf8_gen(utf8_sep(0, code))
end

it's only 11 loc, witch your [1] has 14.

notice it will produce all value in 32bit, but not yours, yours will
fall on 2097152

2012/6/19 Patrick Rapin <toupie300@gmail.com>:
> Essentially as an exercise, I tried to write the smaller possible
> UTF-8 encoder in Lua [1].
> Compared to a naive implementation like in [2], it is around 2.6 times shorter.
> Still, I am wondering if the code could be further shorted (not
> counting space removal).
>
> [1] https://gist.github.com/b0ae016da7b8f0b221ff
> [2] http://lwn.net/Articles/493167/  (and that implementation doesn't
> handle 4 bytes codes)
>