Re: byteoffset() in lutf8lib.c from 5.3, work2

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: byteoffset() in lutf8lib.c from 5.3, work2
From: Sean Conner <sean@...>
Date: Wed, 14 May 2014 02:14:47 -0400

It was thus said that the Great Coroutines once stated:
> On Tue, May 13, 2014 at 10:46 PM, KHMan <keinhong@gmail.com> wrote:
> 
> > Done by committees of cultures who are sort of competing with each other.
> > And then there are the pressure groups... What did you expect? ;-) We had a
> > good laugh at some of the new Unicode glyphs here on the list some time
> > ago...
> 
> I get it but I don't get it.  You'd think they would consult the
> programmers when trying to engineer something like this.  I was just
> thinking how it'd be difficult to arrange character sets so they can
> be easily transformed from lowercase to uppercase and back -- for
> something like a-z to A-Z this is easy, but because certain characters
> are used in many languages there would have to be repeats within the
> standard to make this 'efficient'.  Things really should have been
> organized in codepoint ranges going by character class, not character
> ~category~.  The encoding form makes sense, the way it is organized
> does not :(  Mapping tables blow and so does the rest of the world
> speaking languages that aren't common anymore ~

  That's because alphabets [1] aren't logical.  I've already mentioned the
Turkish I, İ, ı and i, [2] but there's also the German ß, which capitalizes
as SS [4].  And then there are languagues (like Cherokee) that don't have
the concept of "upper and lower case" letters.  Then there's Korean, which
is a syllabry and not an alphabet.  Then there's Chinese, which uses symbol
a symbol (or symbols) to represent a word (or concept), and thus, too, does
not have the concept of "upper and lower case".  

  Then you have langauges like Arabic, which has different letter forms for
a given letter depending on where in the word it appears (and may or may not
have vowels [5]).  Oh, and the annoying habit of being written right to
left [6].

> ISO 8859-1 is nice <3 "Extended ASCII" -- for when I don't give a flip
> about unicode :-)

  -spc (What?  No iso-8859-13?)

[1]	For various values of "alphabet"

[2]	http://en.wikipedia.org/wiki/Dotted_and_dotless_I

[3]	http://en.wikipedia.org/wiki/%C3%9F

[4]	Mostly---check the Wikipedia page [3] for details.

[5]	Oh, and in ASCII, vowels aren't segregated into their own range.
	I'm just saying ... 

[6]	Okay, so how do you quote an Arabic saying (right to left) in an
	English document (left to right)?

Follow-Ups:
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Coroutines
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Coda Highland

References:
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Sean Conner
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Coroutines
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Sean Conner
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Tim Hill
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Coda Highland
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Coroutines
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Dirk Laurie
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Coroutines
- Re: byteoffset() in lutf8lib.c from 5.3, work2, KHMan
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Coroutines

Prev by Date: Re: byteoffset() in lutf8lib.c from 5.3, work2
Next by Date: Re: Shared libraries
Previous by thread: Re: byteoffset() in lutf8lib.c from 5.3, work2
Next by thread: Re: byteoffset() in lutf8lib.c from 5.3, work2
Index(es):
- Date
- Thread