lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Mandatory disclosure: I am not a Unicode lawyer. I do not know the standard that well. Hans Hagen's message is far more useful in explaining how the typography is supposed to work.

BTW, figuring out what my argument actually was made me realize my design for dynamic-codepoints-for-grapheme-clusters had a nasty problem in *still* not being closed over NFC for concatenation. Nothing like the sinking feeling of a leaking design to start your day.

On Dec 6, 2012, at 12:12 AM, Dirk Laurie wrote:

> 2012/12/5 Jay Carlson <nop@nop.com>:
> 
>> Here's a nickel. Get yourself a real operating system
>> (or perhaps just a real MUA).
> 
> You're the second poster to make snide remarks at my OS.

Yes. This should tell you something.

It's always risky to depend on cultural references for subtlety on any mailing list. To be clear, I'm self-deprecatingly placing myself in the role of "one of those condescending Unix users!" in the Dilbert strip.[1] In a bit of irony, Mac- and Windows-native software tends to have better Unicode handling than Unix-y software.

> Adam called it "crappy".

I'll call it likely non-conformant and a bad example to draw lessons from. As a rule of thumb, display of NFC (generally, precomposed when possible) and NFD (decomposed) should be indistinguishable, especially in the case of single combining marks; the fact that it is not makes me suspect there are other bugs lurking around. The display itself is a bug if you consider crappy typography to be a bug (and I suspect you do, based on your complaint about the aesthetics of the decomposed case).

> Actually unnecessary decomposed characters cannot arise
> on my system without great inconvenience, so I can't blame
> the authors for failing to provide an output mechanism that
> uncraps crappy input.

I'm confused; like mine, your mail reader doesn't see any difference between â and â ? Text composed on your system is not the only place Unicode comes from. Conforming Unicode applications may exchange canonically equivalent forms at any point.[2]

> My system composes at keyboard entry level.   I hit Compose,
> `a`, and `^`, and a genuine `â` appears, no matter which
> program is asking for input.

In Unicode there is no normative preference for precomposed vs composed sequences, and applications cannot assume canonically equivalent sequences will be treated differently by recipients. Calling one "genuine" makes no sense.

If you're an XML geek, it's like saying you get a genuine &lt;&amp;&gt;&lt; instead of <![CDATA[<&><]]>. The two sequences have identical semantics and any processor is allowed to substitute one for the other at any level of handling of PCDATA.

Unicode is an interchange format. Internal behavior of applications is a separate matter. I don't know if it's strictly a conformance issue to display the two-codepoint version of â differently, but it is a serious violation of intent.

Jay

[1]: Actually Neil Stephenson's misquote of the one on cman's door, but that's not important.

[2]: There's some kind of mess with W3C normalization, but I don't really understand the implications.