[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Clearing up misconceptions about characters vs bytes in the manual
- From: William Ahern <william@...>
- Date: Sun, 4 Nov 2012 15:10:47 -0800
On Fri, Nov 02, 2012 at 07:55:41PM +0100, spir wrote:
<snip>
> There is, I guess, no hope to get back the ideal simplicity of 1 char <-->
> 1 repr (and even less representations of equal lengths) we lived with in
> ascii & iso-latin times. There is affordable way to get strings as a
> sequences of chars, with s[i] = ith char, exactly, and complete.
Perl6 does this with it's homegrown "NFG" normalization form. Graphemes
which in Unicode are not assigned a single codepoint are assigned one
dynamically.
There's surprisingly little information about this available online. You
basically need to refer to the Parrot and Perl6 documentation--and sometimes
source code--to decipher the details.
See, e.g.
http://docs.parrot.org/parrot/devel/html/docs/pdds/pdd28_strings.pod.html