lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Wed, Oct 08, 2008 at 12:37:39PM +0100, Matthew Wild wrote:
> It's a hack, and there is probably a nice(r) way of doing it, but try:
> 
> logstring = logstring:sub(3):gsub("%z", "")
> 
> It will at least remove the zeros that stop it from printing, but if
> you have non-latin characters then they might get messed up.

a bit cleaner (and more expensive) would be something like

s = s:gsub('(.)(.)', function (lo,hi) return 0==hi and lo or '?' end)

this transforms all characters in the Latin-1 subset to their
Latin-1 code and all others, including the nasty BOM, into a '?'


conversion of UCS-2 to UTF-8 can also easily be done in Lua
(although using iconv is probably considerably faster, if you have it):

local format, mod, floor = string.format, math.mod, math.floor
function utf8 (i) -- BMP only
  if i<128 then return format("%c", i) end
  if i<2048 then return format("%c%c", 192+i/64, 128+mod(i,64)) end
  local j=floor(i/4096)
  i = i-j*4096
  return format("%c%c%c", 224+j, 128+i/64, 128+mod(i,64))
end
s = s:gsub('(.)(.)', function (lo,hi) return utf8(hi*256+lo) end)


make sure to read your file in binary chunks of even size.


cheers
Klaus