[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Plea for the support of unicode escape sequences
- From: David Given <dg@...>
- Date: Thu, 30 Jun 2011 13:20:07 +0100
Jim Whitehead II wrote:
[...]
> The official unicode roadmap includes a code map for Tengwar:
> http://en.wikipedia.org/wiki/Tengwar#Unicode
However, as I discovered the other day when I needed them for a
particularly esoteric program I was writing, there is no Malachim,
Celestial, Theban, or Transitus Fluvii scripts. There isn't even any
Enochian. Unicode 6 *has* added a bunch of alchemical symbols, but
there's only so much you can do with those...
Wrenching the discussion at least back in the direction of being on
topic, I think that at some point they're going to have to lift the
0x110000 limit on the Unicode space size. I'm pretty sure that limit was
only imposed to keep Java and Windows happy; they standardised on UCS-2
way too soon, back when they thought 0x10000 was more than anyone would
need, and as a result shot themselves in the foot really badly. If you
don't believe me, just go look at surrogates, and then check out the
Java String API and the hideous mess that is charAt() vs
codePointAt()... and then try using astral plane code points in online
services and seeing how many of them actually work.
Which is why, of course, people should not be using UCS-2 or UTF-16 for
anything. In fact, I'd suggest not using UCS-4 either --- it encourages
shortcuts in handling Unicode that aren't actually valid, like assuming
you can split strings anywhere. UTF-8 FTW.
--
┌─── dg@cowlark.com ───── http://www.cowlark.com ─────
│ "I have always wished for my computer to be as easy to use as my
│ telephone; my wish has come true because I can no longer figure out
│ how to use my telephone." --- Bjarne Stroustrup
Attachment:
signature.asc
Description: OpenPGP digital signature