lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


> I agree with Lisa, but on Windows, UTF-16 is almost unavoidable (even if 
> MS provide functions to convert to UTF-8, which could be useful for 
> processing in Lua).

It is not just Windows that uses UTF-16.  If you look at the wikipedia reference for UTF-16 it lists some common OS and applications:

* Everything Microsoft - Windows (including Pocket PC) and applications 
* MacOS X and applications 
* Symbian (phone/mobile OS) [Symbian]
* Qualcomm BREW (phone/mobile OS)
* SAP [SAP] 
* Sybase [Sybase] 
* International Components for Unicode [ICU] 
* Rosette Core Library for Unicode [Rosette] 
* Modern, widespread browsers: IE, Mozilla, Opera 
* XML DOM 2.0 API and popular parsers (e.g. Apache Xerces) 
* KDE/Qt and applications 
* OpenOffice 
* Modern programming languages 
	* Java 
	* ECMAScript (JavaScript/JScript) 
	* All .Net languages (C#, J#, VB.Net, etc.) 
	* Python 1.6 (see Unicode in Python [Python]) 
	* Ada 95 [Ada95] 
	* Enterprise Cobol [Cobol]

If you really don't want to use UTF-16 on Windows then you can use MultiByteToWideChar and WideCharToMultiByte to convert between different representations (see MSDN: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_9i79.asp).

Phil

-----Original Message-----
From: lua-bounces@bazar2.conectiva.com.br [mailto:lua-bounces@bazar2.conectiva.com.br] On Behalf Of Philippe Lhoste
Sent: Thursday, September 14, 2006 9:49 AM
To: lua@bazar2.conectiva.com.br
Subject: Re: newbie - Lua and unicode

Klaus Ripke a écrit :
> On Wed, Sep 13, 2006 at 06:24:17PM +0300, Theodor-Iulian Ciobanu wrote:
>>   What modules do I need to be able to use unicode with Lua? (especially parsing of logs).
> http://lua-users.org/wiki/LuaUnicode
>> And is there a way to use both ANSI and Unicode?
> Yes, the snlunicode package provides two single-byte modules (ascii and latin1)
> as well as two multi-byte modules (utf8 and grapheme) with full support
> for all Unicode character classes, upper/lower etc in UTF-8.
> Conversion between other Unicode encodings like UTF-16 native/BE/LE/BOM
> and UTF-8 is trivial.
> 
> As Lisa pointed out, you should avoid UTF-16 like the plague.

Well, he is on Windows (XP or 2k, I suppose) and he is parsing log files 
which might be generated by some Windows tools, so uses the native 
encoding: UTF-16.
I agree with Lisa, but on Windows, UTF-16 is almost unavoidable (even if 
MS provide functions to convert to UTF-8, which could be useful for 
processing in Lua).

The resource you gave is interesting, thanks.

-- 
Philippe Lhoste
--  (near) Paris -- France
--  http://Phi.Lho.free.fr
--  --  --  --  --  --  --  --  --  --  --  --  --  --