[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: byteoffset() in lutf8lib.c from 5.3, work2
- From: Sean Conner <sean@...>
- Date: Wed, 14 May 2014 02:14:47 -0400
It was thus said that the Great Coroutines once stated:
> On Tue, May 13, 2014 at 10:46 PM, KHMan <keinhong@gmail.com> wrote:
>
> > Done by committees of cultures who are sort of competing with each other.
> > And then there are the pressure groups... What did you expect? ;-) We had a
> > good laugh at some of the new Unicode glyphs here on the list some time
> > ago...
>
> I get it but I don't get it. You'd think they would consult the
> programmers when trying to engineer something like this. I was just
> thinking how it'd be difficult to arrange character sets so they can
> be easily transformed from lowercase to uppercase and back -- for
> something like a-z to A-Z this is easy, but because certain characters
> are used in many languages there would have to be repeats within the
> standard to make this 'efficient'. Things really should have been
> organized in codepoint ranges going by character class, not character
> ~category~. The encoding form makes sense, the way it is organized
> does not :( Mapping tables blow and so does the rest of the world
> speaking languages that aren't common anymore ~
That's because alphabets [1] aren't logical. I've already mentioned the
Turkish I, İ, ı and i, [2] but there's also the German ß, which capitalizes
as SS [4]. And then there are languagues (like Cherokee) that don't have
the concept of "upper and lower case" letters. Then there's Korean, which
is a syllabry and not an alphabet. Then there's Chinese, which uses symbol
a symbol (or symbols) to represent a word (or concept), and thus, too, does
not have the concept of "upper and lower case".
Then you have langauges like Arabic, which has different letter forms for
a given letter depending on where in the word it appears (and may or may not
have vowels [5]). Oh, and the annoying habit of being written right to
left [6].
> ISO 8859-1 is nice <3 "Extended ASCII" -- for when I don't give a flip
> about unicode :-)
-spc (What? No iso-8859-13?)
[1] For various values of "alphabet"
[2] http://en.wikipedia.org/wiki/Dotted_and_dotless_I
[3] http://en.wikipedia.org/wiki/%C3%9F
[4] Mostly---check the Wikipedia page [3] for details.
[5] Oh, and in ASCII, vowels aren't segregated into their own range.
I'm just saying ...
[6] Okay, so how do you quote an Arabic saying (right to left) in an
English document (left to right)?
- References:
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Sean Conner
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Coroutines
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Sean Conner
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Tim Hill
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Coda Highland
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Coroutines
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Dirk Laurie
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Coroutines
- Re: byteoffset() in lutf8lib.c from 5.3, work2, KHMan
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Coroutines