lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Fri, Nov 02, 2012 at 07:55:41PM +0100, spir wrote:
<snip>
> There is, I guess, no hope to get back the ideal simplicity of 1 char <--> 
> 1 repr (and even less representations of equal lengths) we lived with in 
> ascii & iso-latin times. There is affordable way to get strings as a 
> sequences of chars, with s[i] = ith char, exactly, and complete.

Perl6 does this with it's homegrown "NFG" normalization form. Graphemes
which in Unicode are not assigned a single codepoint are assigned one
dynamically.

There's surprisingly little information about this available online. You
basically need to refer to the Parrot and Perl6 documentation--and sometimes
source code--to decipher the details.

See, e.g.
http://docs.parrot.org/parrot/devel/html/docs/pdds/pdd28_strings.pod.html