[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: RE: Unicode?
- From: "Ivan-Assen Ivanov" <assen@...>
- Date: Thu, 12 Jun 2003 11:11:07 +0300
> > But two identical utf-8 characters can have different
> encoding, right?
> No. I mean, if they have the same unicode number, they must
> have the same utf-8 encoding.
Well, it's worse than that.
In languages such as Hindu and Arabic you have ligatures,
collapses of sequential chars into one, like, e.g. some Latin
books print "fi" or "ff" as a single uninterrupted character.
So, to _really_ support text-processing applications in these
languages, you need to know the ligature composition rules
But IMHO this is something best left to the application,
and not attempted at the language level. So string comparisons
of UTF-8 strings _are_ valid string comparisons of the Unicode