[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: announce: UTF-8 lib [Re: setlocal categories?]
- From: Mike Pall <mikelu-0502@...>
- Date: Thu, 17 Feb 2005 13:25:41 +0100
Hi,
Klaus Ripke wrote:
> Frankly speaking I heavily object to the strcoll in lvm.c and in fact
> compile this with strcoll defined to strcmp. There might be occasions
> where you want simple fast reliable bytewise comparision,
> especially where your string does not contain character data,
Yes, this is one of my pet peeves, too. I replaced the whole mess
in l_strcmp with memcmp. Not because it's a lot faster, but because
it's a lot more predictable.
Lua strings are just byte arrays with a length tacked on. And I want
to compare them as such. If I want to use proper collation, I will
certainly use an extra library (with a different approach, as Klaus
explained).
The so-so NLS support in Lua is really problematic. E.g. some libraries
take the LC_* environment variables and do setlocale() on their own
(readline and GTK+ are known to do this). As soon as you link with one
of these libraries you may get into big trouble.
As much as I want to forbid every user on the world to set LC_NUMERIC
or LC_ALL, I really can't stop them. Once they do so, the parser may
screw up:
$ lua -e 'print(assert(loadstring("return 1.2"))())'
1.2
$ lua -e 'os.setlocale("de_DE"); print(assert(loadstring("return 1.2"))())'
lua: (command line):1: [string "return 1.2"]:1: malformed number near `1.2'
There are other cases like the regexp stuff using ctype 'macros'
(which in reality are expensive functions calls with an NLS-aware libc):
Say a network server needs to make sure you throw only alphanumeric
characters at it. But string.find(s, "^%w*$") will behave unpredictably,
depending on the LC_COLLATE/LC_ALL setting. This means the regexp
functions are a big no-no to use for any security-conscious application.
I know there are some difficult to solve problems hidden in there.
Just look at the giant mess that Perl and Python have to go through
to make sure everything works with and without certain NLS settings.
Lua has to find its own sweet spot between no NLS and full NLS. I'd opt
more in favour of 'no NLS in the core'. Maybe it needs to gain more
independence from libc (e.g. doing ctype yourself is trivial). Maybe it
should not use every ISO/ANSI C library function, just because it's there
and it's standard (but badly designed).
Bye,
Mike