[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: OT: (of Lua) Re: Unicode?
- From: Alexey Desyatnik <tls@...>
- Date: Fri, 13 Jun 2003 12:51:14 +0400
On Thu, 12 Jun 2003 18:18:43 -0500, <RLake@oxfam.org.pe> wrote:
Suppose I have three strings: "Ãngstrom", "ângstrom", and "AÌngstrom".
identical, but they don't, at leat on this machine, with this mail client
and this font (Windows NT / Lotus Notes / Lucida Sans Unicode 10 pt, as
it happens), where they look slightly different.
Windows XP Pro / Opera M2 7.11 RU / Courier New 10 pt - the same. There are
no ideal Unicode fonts yet... or font displaying engines?
Well, OK, that is a bit of a cheat because I think they actually turn
into the same string if you apply any Unicode Normalisation
transformation. But what about Cyrillic? (Or Greek, for that matter.) Do
the identifiers "A", "Ð", and "Î" refer to the same object or not? (That
was U+0041, U+410 and U+391, respectively.) What is the general case in
which this is not a Bad Thing? If you are referring to display of text, I
would say that was a pretty specific case.
Not so specific, really :) Let's take "B", "C", "E" (latin) and
"Ð", "Ð", "Ð" (russian). They look identically, but... their alphabetic
position is different (2, 3, 5 and 3, 20, 6 resp.). So these letters _must_
be different for correct sorting etc.
It could have been otherwise with a simple rule: 1 glyph == 1 code.
Simple but wrong...
P.S. Sorry for bad English ;)