lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Comments below...

> > Lua could use UTF-8, since it does not have "str[i]" type indexing and
is
> > 8-bit clean.  Evidently, the str* functions are affected - but these are
> > outside the core, and therefore neatly replaceable.  The basic idea
would
> > be to cover all the in and outcoming cases where strings are involved,
> > and to keep the Lua core mostly as is.
>
> Yes, this seems the best option. I think we would need only a new strlib
> (that lib would change a lot); but everything else should work without
> any changes.
>
> But, then, my other question: what is the relationship between Windows CE
> and Unicode? Why did everybody that tryed to port Lua to Windows CE come
up
> with this subject? Why can't they just use this approach (UTF-8)?
> (this is pure ignorance of my part; I know nothing about Windows CE...)

The problem, as I understand it, is that Windows CE simply requires all
strings to be UNICODE (the type shall be wchar_t), which means that even
though UTF-8 encoded strings would be nice, it simply won't work for Windows
CE... :-(

The way Microsoft solves this (this is the "Microsoft scheme" I mentioned in
an earlier posting), is that they have a series of typedefs and defines
along the following lines (this is *very* simplified):

#ifdef _UNICODE
typedef wchar_t TCHAR;
#define _T(x) L##x
#define _tcslen wcslen
...
#else
typedef char TCHAR;
#define _T(x) x
#define _tcslen strlen
...
#endif

This basically means that if you define the preprocessor symbol '_UNICODE'
and use the macros, everything will be set for wchar_t-type strings
(presumeably in UNICODE format), else you will get regular or multi-byte
strings. This also means that all libraryfunctions must exist in two
versions, one for standard char and one for wchar_t (I did a stupid mistake
here, where I for a moment though that the wide-character versions where
also part of the standard C library, but I believe they are not... or?).

You can then write code like:

int i = _tcslen( _T("hello world") );

and this will expand to either

int i = wcslen( L"hello world" );

or

int i = strlen( "hello world" );

depending on the definition of _UNICODE.

/johan