lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Tue, May 13, 2014 at 11:14 PM, Sean Conner <sean@conman.org> wrote:
> Then there's Korean, which is a syllabry and not an alphabet.

Actually, Korean IS an alphabet. It's TYPESET into syllabic blocks. In
NFD normalization, Korean text is encoded alphabetically, with
consonants, vowels, and codas acting as combining characters. In
NFC-normalized or unnormalized Korean text, yes, they're encoded
syllabically, but that's no different than using precombined Latin
characters.

Japanese IS a syllabary, on the other hand. It doesn't have uppercase
and lowercase variants, but it does HAVE variants, which is closer in
usage to italics in English text (foreign words and emphasis, or just
for style). But while every hiragana character has a matching katakana
(and vice-versa), you can't just throw a hypothetical "tokatakana()"
(to match "toupper()") at a block of hiragana text and expect the
orthography to come out correctly because the two scripts have
different ways of writing long vowels. The operation and its reverse
are even /ambiguous/, as katakana エー (e-) could be えい (ei) or ええ (ee)
in hiragana, and in the absence of context you can't tell if a given
pair of characters is actually a long vowel or if they're actually
separate words.



There IS an argument to be made that Unicode could have dispensed with
precombined characters. I'm not sure how I feel about that.

/s/ Adam