lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Tue, May 13, 2014 at 11:14 PM, Sean Conner <sean@conman.org> wrote:

>   That's because alphabets [1] aren't logical.  I've already mentioned the
> Turkish I, İ, ı and i, [2] but there's also the German ß, which capitalizes
> as SS [4].  And then there are languagues (like Cherokee) that don't have
> the concept of "upper and lower case" letters.  Then there's Korean, which
> is a syllabry and not an alphabet.  Then there's Chinese, which uses symbol
> a symbol (or symbols) to represent a word (or concept), and thus, too, does
> not have the concept of "upper and lower case".
>
>   Then you have langauges like Arabic, which has different letter forms for
> a given letter depending on where in the word it appears (and may or may not
> have vowels [5]).  Oh, and the annoying habit of being written right to
> left [6].

I recognize this -- I'm saying for alphabets that do have a concept of
upper and lowercase you can arrange it so the difference works for the
conversion like in ASCII.  What I'm suggesting would result in repeats
and 'holes' in unicode, but it'd be far better than having long maps
for every character class as is currently done.  You could fill the
holes with miscellaneous characters, or just leave them there and have
one map for invalid codepoints.