lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

On Thu, Dec 29, 2005 at 11:38:57AM +0000, Lisa Parratt wrote:
> On 29 Dec 2005, at 11:06, Klaus Ripke wrote:
> >
> A few observations, reading this page:
> Lack of "\U+1234" style unicode character escapes - it strikes me  
> that the code to isolate such an escape, and then convert it to an 8  
> bit string would only take a few lines of code. Is there a good  
> theological reason why this isn't supported?
It would require the parser to settle for a given encoding
like UTF-8 or UCS2 or UTF16 or ...
OTOH a preprocessing step either at build time or
as a load hook could do this and much more.
Personally I prefer to have my editor produce UTF-8.

> Inability to use UTF-8 identifiers due to use of isalpha and isalnum  
> - surely it would be better to use  hardcoded functions for  
> determining if the characters in an identifier are valid? Otherwise  
> there will be potential locale issues anyway. Locales should apply to  
> human languages, not computer languages!
d'accord (although this is not unicode related)

> Unicode string comparison and normalisation issues - I might be being  
> forgetful, but I was under the impression C99 added Unicode compliant  
> wide character comparison functions - perhaps these should be used if  
> present?
You might not want to use wide chars at all
(there are pros and cons compared to using UTF-8 internally).
For UTF-8 good old strcoll/strxfrm (hence Lua) does the job,
with appropriate locale settings.
Anyway many consider the "locale" mechanism broken,
and a full implementation of the unicode collation algorithm
has to be quite expensive.