[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: RE: question about Unicode
- From: "Jerome Vuarand" <jerome.vuarand@...>
- Date: Tue, 5 Dec 2006 10:23:24 -0500
David Given wrote:
> I want to write a text editor, and so there'll be lots of
> nasty fetch-the-character-from-column-Z issues. Assuming each
> grapheme cluster renders into a single character cell ---
> which I know is not strictly valid, as some clusters will
> occupy multiple cells --- then dealing with character offsets
> instead of byte offsets will make life much easier.
Also keep in mind that many Unicode characters are meant to be combined with others (`+E gives È for example), and as such you will have multiple unicode codepoints for a single grapheme (and a single character cell). Character offset in unicode strings don't reflect grapheme offset in the string graphical representation, even with fixed width fonts.