[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: announce: UTF-8 lib [Re: setlocal categories?]
- From: Klaus Ripke <paul-lua@...>
- Date: Thu, 17 Feb 2005 13:54:08 +0100
On Thursday 17 February 2005 13:25, Mike Pall wrote:
> There are other cases like the regexp stuff using ctype 'macros'
> (which in reality are expensive functions calls with an NLS-aware libc):
in the UTF-8 these are real macros into the udata table, fast and predictable.
> Say a network server needs to make sure you throw only alphanumeric
> characters at it. But string.find(s, "^%w*$") will behave unpredictably,
I reckon we should also have a version of string called ascii with
all that hardcoded for reliability.
In order to make UTF-8 a usable string replacement w/o bloating
the image too much I'm considering to make the same string-like
interface with identical C-code available as different closures
ascii and utf8 and grapheme and maybe also latin1 with everything
hardcoded and the closure determining the mode of operation.
Still ascii and latin1 could use the same character category table
(sorry, Latin-n users, this is straight only for Latin-1 ... not my fault).