[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: unicode char ranges
- From: Hans Hagen <pragma@...>
- Date: Thu, 06 Dec 2012 10:48:50 +0100
On 12/6/2012 6:12 AM, Dirk Laurie wrote:
2012/12/5 Jay Carlson <nop@nop.com>:
Here's a nickel. Get yourself a real operating system
(or perhaps just a real MUA).
You're the second poster to make snide remarks at my OS.
Adam called it "crappy".
Actually unnecessary decomposed characters cannot arise
on my system without great inconvenience, so I can't blame
the authors for failing to provide an output mechanism that
uncraps crappy input.
Typographic issues are a bit beyond this list, but here is how it works
(a it simplified as more is involved):
- input can consist of either a sequence of characters that are turned
into one (u + diaeresis = udiaeresis) or of direct code points
(udiaeresis); from the linguistic point of view the two dots can
represent something different per language, e.g. an umlaut in german
- a font can provide a composed characters as precomposed or as
decomposed and most modern (truetype/opentype) fonts provide for this;
some fonts have composed glyphs but at the same time carry the
information of how to compose them from other glyphs
- the way composition happens depends on the font logic: it can be done
via substitution (resulting in a precomposed glyph) or relative
positioning; fonts may also require a decomposition step and start from
the individual characters
- in most cases already at the input stage collapsing takes place i.e.
decomposed sequences get turned into composed (but a font might demand
decomposition later on)
- characters get represented by glyphs and there is a one to many
relationship, think of smallcap, oldstyle and other renderings; a font
can have rulesets that are to be applied in sequence
- in a precomposed glyph the (for instance) accent is part of the
package and the graphic definition might provide clues for rendering
(hinting)
- in the decomposed case the base character and the accent (officially
called mark) get positioned relative to each other using so called
anchors; in that case you can run into rounding errors and hinting can
be less optimal
- if none of this works, which is the case if no entry for the composed
glyph is provided i.e. no information is available on how to deal with
the situation, the characters get overlayed due to the fact that an
accent has either width zero or some fixed width (fonts are not
consistent in this)
- of course a font renderer can apply some heuristics i.e. centering the
accent over the base character
- in addition, operating systems often use technologies where, if a font
has no entry, a glyph from another font is taken
- situations where ligaturing is involved (nb: an accented character is
not a ligature) things can be more complex as each component of the
ligature can get its own marks (for instance in arabic scripts)
- some languages have stacked marks, for example vietnamese, so there we
run into base to mark and mark to mark situations (given that no
precomposed glyph is present)
Now to operating systems (just some personal observations):
- windows: the font rendering technology (volt, cleartype, etc) is quite
good given that a decent font is used; in xp one had to turn on
cleartype explicitly
- osx: no issues (apart from occasional issues in the built in pdf
renderer); there is some apple font technology but I think it's being
phased out in favor for generic opentype
- linux: the technology is okay, but not always applied / configured
right; one of the things i like about (x)ubuntu is that right from the
start they got this right i.e. enabled anti-aliasing and other features
as well as chose fonts that render okay (so, in case of doubt about the
quality, just check the settings)
microsoft and apple have some advantage here as they are behind the
current font technologies (truetype and opentype)
so: rendering is not so much os related, but more a matter of using the
right fonts and setting up the machinery right; of course a high res
screen helps too
My system composes at keyboard entry level. I hit Compose,
`a`, and `^`, and a genuine `â` appears, no matter which
program is asking for input.
it might be less optimal for chinese, korean or arabic (more font
dependent as well as renderer dependent; for arabic one can often see
the font machinery realtime in action when one keys in characters
because sequences of characters are turned into combined shapes that
need relative (vertical and horizontal) positioning as well as mark
anchoring
To produce the second, decomposed, one in my post I had
to remind myself of the Unicode for combining circumflex
by consulting a document I wrote in August 2011 (revised
thanks to the present discussion and appended, helpful
comments welcome).
such documents are actually good tests for checking support of
characters in an editor
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------