[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Unicode stream library for Lua
- From: Alexandre Erwin Ittner <aittner@...>
- Date: Sun, 8 Jun 2008 13:25:33 -0300
David Given <dg@cowlark.com> wrote
> It'll convert from any Unicode encoding to any other Unicode encoding.
> In your case, tell it to convert to UCS-32 and then you can read each
> Unicode code point as a number type.
You may also wrap the standard I/O functions into a Unicode compatibility
layer for a more natural usage. Something like this module (written
minutes ago and not extensively tested):
-------------------
module("uniopen", package.seeall)
require "iconv"
local mt = { __index = _M }
function open(fname, mode, fromcharset, tocharset)
assert(mode == "r" or mode == "rb", "Only read modes are supported yet")
tocharset = tocharset or "utf8"
local cd = assert(iconv.new(fromcharset, tocharset), "Bad charset")
local fp = io.open(fname, mode)
if not fp then
return nil
end
local o = { fp = fp, cd = cd }
setmetatable(o, mt)
return o;
end
function read(fp, mod)
assert(fp and fp.fp and fp.cd, "Bad file descriptor")
local ret = fp.fp:read(mod)
if ret then
return fp.cd:iconv(ret) -- returns: string, error code
else
return nil
end
end
function close(fp)
assert(fp and fp.fp, "Bad file descriptor")
fp.fp:close()
end
-------------------
As noted above, Unicode character splitting is a pretty complex subject.
Since you can not use fp:read(some_number_of_bytes) (it may get invalid
codepoints and yield iconv.ERROR_INCOMPLETE) there is no easy way to
limit your input to a secure length. I have not used slnunicode yet, but
I think it have some functions for these operations.
--
Alexandre Erwin Ittner - aittner@netuno.com.br
OpenPGP pubkey 0x0041A1FB @ http://pgp.mit.edu