Re: Clearing up misconceptions about characters vs bytes in the manual

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Clearing up misconceptions about characters vs bytes in the manual
From: William Ahern <william@...>
Date: Sun, 4 Nov 2012 15:10:47 -0800

On Fri, Nov 02, 2012 at 07:55:41PM +0100, spir wrote:
<snip>
> There is, I guess, no hope to get back the ideal simplicity of 1 char <--> 
> 1 repr (and even less representations of equal lengths) we lived with in 
> ascii & iso-latin times. There is affordable way to get strings as a 
> sequences of chars, with s[i] = ith char, exactly, and complete.

Perl6 does this with it's homegrown "NFG" normalization form. Graphemes
which in Unicode are not assigned a single codepoint are assigned one
dynamically.

There's surprisingly little information about this available online. You
basically need to refer to the Parrot and Perl6 documentation--and sometimes
source code--to decipher the details.

See, e.g.
http://docs.parrot.org/parrot/devel/html/docs/pdds/pdd28_strings.pod.html

Follow-Ups:
- Re: Clearing up misconceptions about characters vs bytes in the manual, spir

References:
- Clearing up misconceptions about characters vs bytes in the manual, Rob Hoelz
- Re: Clearing up misconceptions about characters vs bytes in the manual, Rapin Patrick
- Re: Clearing up misconceptions about characters vs bytes in the manual, M. Edward (Ed) Borasky
- Re: Clearing up misconceptions about characters vs bytes in the manual, spir

Prev by Date: [ANN] MoonScript v0.2.2
Next by Date: Re: Clearing up misconceptions about characters vs bytes in the manual
Previous by thread: Re: Clearing up misconceptions about characters vs bytes in the manual
Next by thread: Re: Clearing up misconceptions about characters vs bytes in the manual
Index(es):
- Date
- Thread