[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: question about Unicode
- From: Glenn Maynard <glenn@...>
- Date: Tue, 5 Dec 2006 21:27:04 -0500
On Tue, Dec 05, 2006 at 04:33:48PM +0000, David Given wrote:
> In fact, when dealing with UTF-8 strings, all text should be normalised so you
> *don't* get the issue you mention above. Multiple-character graphemes should
> be collapsed down into a single character whereever possible (I believe that
> it is possible for all romance languages, but I could be wrong).
But with general combining, there will always be combinations that
don't. If you're writing a good UTF-8 editor, it seems like good
manners to not normalize the text file the user is editing without
being asked, too (even if new text is created in eg. NFC).
> I think that's all I need. I should be able to do the rest with just those
> three, and conventional string munging tools. Hmm...
I'd recommend something along those lines--keep the core string handling
using bytes, and have the rendering-based stuff that deals in "columns"
at a higher level.