[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: unicode char ranges
- From: Dirk Laurie <dirk.laurie@...>
- Date: Thu, 6 Dec 2012 07:12:09 +0200
2012/12/5 Jay Carlson <nop@nop.com>:
> Here's a nickel. Get yourself a real operating system
> (or perhaps just a real MUA).
You're the second poster to make snide remarks at my OS.
Adam called it "crappy".
Actually unnecessary decomposed characters cannot arise
on my system without great inconvenience, so I can't blame
the authors for failing to provide an output mechanism that
uncraps crappy input.
My system composes at keyboard entry level. I hit Compose,
`a`, and `^`, and a genuine `â` appears, no matter which
program is asking for input.
To produce the second, decomposed, one in my post I had
to remind myself of the Unicode for combining circumflex
by consulting a document I wrote in August 2011 (revised
thanks to the present discussion and appended, helpful
comments welcome).
Dirk
Getting Unicode characters into your document
======
0. Don't assume everybody's system has all the fonts yours has.
Or mine, for that matter. It can't even do all the symbols
I have included here, e.g. u214F "SYMBOL FOR SAMARITAN SOURCE".
1. A Unicode character typically is 2 bytes long, but will probably
be entered into your document as UTF-8, an encoding into sequences
of anywhere from 1 to 6 bytes. This document does not go into
the details of that encoding.
2. Many websites. I get my stuff from
<http://www.fileformat.info/info/unicode/block>
3. If your system has a compose key, there probably is a file somewhere
with `Compose` in its name, listing those characters that you can
make and how to make them. On some systems a composed Unicode
character is entered into the document, on others a combination
of characters (see Combining Diacritics, below) that is supposed
to mean the same.
4. Your system may also have a helper app which, if you've read this
far, you probably don't like all that much.
5. If all else fails, all up-to-date systems have a way of entering
the actual four-hexdigit number. The FileFormat site says how to
to it on Windows and in HTML, C, Python etc. On mine (Gnome, i.e.
standard Ubuntu), it is Ctrl-Shift-U (it shows an underlined `u`)
and then either hold Ctrl-Shift in and type the four digits, or
let go and type them in followed by Return. The first method is
nicer if and only if you don't mistype any digits.
6. Here are my favourites. Copy-and-paste them. There's lots more
on the FileFormat site.
Latin-1 Supplement u0080–u00FF
-----
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
A0 ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ® ¯
B0 ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿
C0 À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
D0 Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
E0 à á â ã ä å æ ç è é ê ë ì í î ï
F0 ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ
NB: u00A0 is a non-breaking space. The first 32 characters of this
block are not shown. They are non-printing characters with names like
"START OF GUARDED AREA", which to my mind suggests that ordinary folk
shouldn't be using them in e-mails.
Latin Extended-A u0100–u017F
-----
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
100 Ā ā Ă ă Ą ą Ć ć Ĉ ĉ Ċ ċ Č č Ď ď
110 Đ đ Ē ē Ĕ ĕ Ė ė Ę ę Ě ě Ĝ ĝ Ğ ğ
120 Ġ ġ Ģ ģ Ĥ ĥ Ħ ħ Ĩ ĩ Ī ī Ĭ ĭ Į į
130 İ ı IJ ij Ĵ ĵ Ķ ķ ĸ Ĺ ĺ Ļ ļ Ľ ľ Ŀ
140 ŀ Ł ł Ń ń Ņ ņ Ň ň ʼn Ŋ ŋ Ō ō Ŏ ŏ
150 Ő ő Œ œ Ŕ ŕ Ŗ ŗ Ř ř Ś ś Ŝ ŝ Ş ş
160 Š š Ţ ţ Ť ť Ŧ ŧ Ũ ũ Ū ū Ŭ ŭ Ů ů
170 Ű ű Ų ų Ŵ ŵ Ŷ ŷ Ÿ Ź ź Ż ż Ž ž ſ
See also: Latin Extended-B, u0180–u024F; IPA Extensions, u0250–02AF;
Latin Extended Additional, u1E00–u1EFF, and more.
Combining diacritics u0300–036F
-----
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
300 c̀ ć ĉ c̃ c̄ c̅ c̆ ċ c̈ c̉ c̊ c̋ č c̍ c̎ c̏
310 c̐ c̑ c̒ c̓ c̔ c̕ c̖ c̗ c̘ c̙ c̚ c̛ c̜ c̝ c̞ c̟
320 c̠ c̡ c̢ c̣ c̤ c̥ c̦ ç c̨ c̩ c̪ c̫ c̬ c̭ c̮ c̯
330 c̰ c̱ c̲ c̳ c̴ c̵ c̶ c̷ c̸ c̹ c̺ c̻ c̼ c̽ c̾ c̿
340 c̀ ć c͂ c̓ c̈́ cͅ c͆ c͇ c͈ c͉ c͊ c͋ c͌ c͍ c͎ c͏
350 c͐ c͑ c͒ c͓ c͔ c͕ c͖ c͗ c͘ c͙ c͚ c͛ c͜ c͝ c͞ c͟
360 c͠ c͡ c͢ cͣ cͤ cͥ cͦ cͧ cͨ cͩ cͪ cͫ cͬ cͭ cͮ cͯ
These symbols are shown applied to the letter `c`. The Unicode symbol in
question cannot be cut-and-pasted, sorry. You type a letter and then enter
the Unicode. Moreover, remark #0 applies even more strongly here.
Letterlike Symbols u02100–u0214F
------
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
2100 ℀ ℁ ℂ ℃ ℄ ℅ ℆ ℇ ℈ ℉ ℊ ℋ ℌ ℍ ℎ ℏ
2110 ℐ ℑ ℒ ℓ ℔ ℕ № ℗ ℘ ℙ ℚ ℛ ℜ ℝ ℞ ℟
2120 ℠ ℡ ™ ℣ ℤ ℥ Ω ℧ ℨ ℩ K Å ℬ ℭ ℮ ℯ
2130 ℰ ℱ Ⅎ ℳ ℴ ℵ ℶ ℷ ℸ ℹ ℺ ℻ ℼ ℽ ℾ ℿ
2140 ⅀ ⅁ ⅂ ⅃ ⅄ ⅅ ⅆ ⅇ ⅈ ⅉ ⅊ ⅋ ⅌ ⅍ ⅎ ⅏
Number Forms u2150–u218F
-----
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
2150 ⅐ ⅑ ⅒ ⅓ ⅔ ⅕ ⅖ ⅗ ⅘ ⅙ ⅚ ⅛ ⅜ ⅝ ⅞ ⅟
2160 Ⅰ Ⅱ Ⅲ Ⅳ Ⅴ Ⅵ Ⅶ Ⅷ Ⅸ Ⅹ Ⅺ Ⅻ Ⅼ Ⅽ Ⅾ Ⅿ
2170 ⅰ ⅱ ⅲ ⅳ ⅴ ⅵ ⅶ ⅷ ⅸ ⅹ ⅺ ⅻ ⅼ ⅽ ⅾ ⅿ
2180 ↀ ↁ ↂ Ↄ ↄ ↅ ↆ ↇ ↈ ↉ ↊ ↋
Arrows u2190–u21FF
-----
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
2190 ← ↑ → ↓ ↔ ↕ ↖ ↗ ↘ ↙ ↚ ↛ ↜ ↝ ↞ ↟
21a0 ↠ ↡ ↢ ↣ ↤ ↥ ↦ ↧ ↨ ↩ ↪ ↫ ↬ ↭ ↮ ↯
21b0 ↰ ↱ ↲ ↳ ↴ ↵ ↶ ↷ ↸ ↹ ↺ ↻ ↼ ↽ ↾ ↿
21c0 ⇀ ⇁ ⇂ ⇃ ⇄ ⇅ ⇆ ⇇ ⇈ ⇉ ⇊ ⇋ ⇌ ⇍ ⇎ ⇏
21d0 ⇐ ⇑ ⇒ ⇓ ⇔ ⇕ ⇖ ⇗ ⇘ ⇙ ⇚ ⇛ ⇜ ⇝ ⇞ ⇟
21e0 ⇠ ⇡ ⇢ ⇣ ⇤ ⇥ ⇦ ⇧ ⇨ ⇩ ⇪ ⇫ ⇬ ⇭ ⇮ ⇯
21f0 ⇰ ⇱ ⇲ ⇳ ⇴ ⇵ ⇶ ⇷ ⇸ ⇹ ⇺ ⇻ ⇼ ⇽ ⇾ ⇿
See also: Supplemental Arrows-A, u27F0–u27FF; Supplemental Arrows-B,
u2900–u297F.
Mathematical Operators
-----
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
2200 ∀ ∁ ∂ ∃ ∄ ∅ ∆ ∇ ∈ ∉ ∊ ∋ ∌ ∍ ∎ ∏
2210 ∐ ∑ − ∓ ∔ ∕ ∖ ∗ ∘ ∙ √ ∛ ∜ ∝ ∞ ∟
2220 ∠ ∡ ∢ ∣ ∤ ∥ ∦ ∧ ∨ ∩ ∪ ∫ ∬ ∭ ∮ ∯
2230 ∰ ∱ ∲ ∳ ∴ ∵ ∶ ∷ ∸ ∹ ∺ ∻ ∼ ∽ ∾ ∿
2240 ≀ ≁ ≂ ≃ ≄ ≅ ≆ ≇ ≈ ≉ ≊ ≋ ≌ ≍ ≎ ≏
2250 ≐ ≑ ≒ ≓ ≔ ≕ ≖ ≗ ≘ ≙ ≚ ≛ ≜ ≝ ≞ ≟
2260 ≠ ≡ ≢ ≣ ≤ ≥ ≦ ≧ ≨ ≩ ≪ ≫ ≬ ≭ ≮ ≯
2270 ≰ ≱ ≲ ≳ ≴ ≵ ≶ ≷ ≸ ≹ ≺ ≻ ≼ ≽ ≾ ≿
2280 ⊀ ⊁ ⊂ ⊃ ⊄ ⊅ ⊆ ⊇ ⊈ ⊉ ⊊ ⊋ ⊌ ⊍ ⊎ ⊏
2290 ⊐ ⊑ ⊒ ⊓ ⊔ ⊕ ⊖ ⊗ ⊘ ⊙ ⊚ ⊛ ⊜ ⊝ ⊞ ⊟
22a0 ⊠ ⊡ ⊢ ⊣ ⊤ ⊥ ⊦ ⊧ ⊨ ⊩ ⊪ ⊫ ⊬ ⊭ ⊮ ⊯
22b0 ⊰ ⊱ ⊲ ⊳ ⊴ ⊵ ⊶ ⊷ ⊸ ⊹ ⊺ ⊻ ⊼ ⊽ ⊾ ⊿
22c0 ⋀ ⋁ ⋂ ⋃ ⋄ ⋅ ⋆ ⋇ ⋈ ⋉ ⋊ ⋋ ⋌ ⋍ ⋎ ⋏
22d0 ⋐ ⋑ ⋒ ⋓ ⋔ ⋕ ⋖ ⋗ ⋘ ⋙ ⋚ ⋛ ⋜ ⋝ ⋞ ⋟
22e0 ⋠ ⋡ ⋢ ⋣ ⋤ ⋥ ⋦ ⋧ ⋨ ⋩ ⋪ ⋫ ⋬ ⋭ ⋮ ⋯
22f0 ⋰ ⋱ ⋲ ⋳ ⋴ ⋵ ⋶ ⋷ ⋸ ⋹ ⋺ ⋻ ⋼ ⋽ ⋾ ⋿
See also: Miscellaneous Technical, u2300–u23FF; Geometric Shapes, u25A0–u25FF;
Miscellaneous Mathematical Symbols-A, u27C0–u27EF, etc.
Enclosed Alphanumerics u2460—u24F0
-----
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
2460 ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ ⑬ ⑭ ⑮ ⑯
2470 ⑰ ⑱ ⑲ ⑳ ⑴ ⑵ ⑶ ⑷ ⑸ ⑹ ⑺ ⑻ ⑼ ⑽ ⑾ ⑿
2480 ⒀ ⒁ ⒂ ⒃ ⒄ ⒅ ⒆ ⒇ ⒈ ⒉ ⒊ ⒋ ⒌ ⒍ ⒎ ⒏
2490 ⒐ ⒑ ⒒ ⒓ ⒔ ⒕ ⒖ ⒗ ⒘ ⒙ ⒚ ⒛ ⒜ ⒝ ⒞ ⒟
24A0 ⒠ ⒡ ⒢ ⒣ ⒤ ⒥ ⒦ ⒧ ⒨ ⒩ ⒪ ⒫ ⒬ ⒭ ⒮ ⒯
24B0 ⒰ ⒱ ⒲ ⒳ ⒴ ⒵ Ⓐ Ⓑ Ⓒ Ⓓ Ⓔ Ⓕ Ⓖ Ⓗ Ⓘ Ⓙ
24C0 Ⓚ Ⓛ Ⓜ Ⓝ Ⓞ Ⓟ Ⓠ Ⓡ Ⓢ Ⓣ Ⓤ Ⓥ Ⓦ Ⓧ Ⓨ Ⓩ
24D0 ⓐ ⓑ ⓒ ⓓ ⓔ ⓕ ⓖ ⓗ ⓘ ⓙ ⓚ ⓛ ⓜ ⓝ ⓞ ⓟ
24E0 ⓠ ⓡ ⓢ ⓣ ⓤ ⓥ ⓦ ⓧ ⓨ ⓩ ⓪ ⓫ ⓬ ⓭ ⓮ ⓯
24F0 ⓰ ⓱ ⓲ ⓳ ⓴ ⓵ ⓶ ⓷ ⓸ ⓹ ⓺ ⓻ ⓼ ⓽ ⓾ ⓿
Miscellaneous Symbols u2600–u26FF
-----
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
2600 ☀ ☁ ☂ ☃ ☄ ★ ☆ ☇ ☈ ☉ ☊ ☋ ☌ ☍ ☎ ☏
2610 ☐ ☑ ☒ ☓ ☔ ☕ ☖ ☗ ☘ ☙ ☚ ☛ ☜ ☝ ☞ ☟
2620 ☠ ☡ ☢ ☣ ☤ ☥ ☦ ☧ ☨ ☩ ☪ ☫ ☬ ☭ ☮ ☯
2630 ☰ ☱ ☲ ☳ ☴ ☵ ☶ ☷ ☸ ☹ ☺ ☻ ☼ ☽ ☾ ☿
2640 ♀ ♁ ♂ ♃ ♄ ♅ ♆ ♇ ♈ ♉ ♊ ♋ ♌ ♍ ♎ ♏
2650 ♐ ♑ ♒ ♓ ♔ ♕ ♖ ♗ ♘ ♙ ♚ ♛ ♜ ♝ ♞ ♟
2660 ♠ ♡ ♢ ♣ ♤ ♥ ♦ ♧ ♨ ♩ ♪ ♫ ♬ ♭ ♮ ♯
2670 ♰ ♱ ♲ ♳ ♴ ♵ ♶ ♷ ♸ ♹ ♺ ♻ ♼ ♽ ♾ ♿
2680 ⚀ ⚁ ⚂ ⚃ ⚄ ⚅ ⚆ ⚇ ⚈ ⚉ ⚊ ⚋ ⚌ ⚍ ⚎ ⚏
2690 ⚐ ⚑ ⚒ ⚓ ⚔ ⚕ ⚖ ⚗ ⚘ ⚙ ⚚ ⚛ ⚜ ⚝ ⚞ ⚟
26A0 ⚠ ⚡ ⚢ ⚣ ⚤ ⚥ ⚦ ⚧ ⚨ ⚩ ⚪ ⚫ ⚬ ⚭ ⚮ ⚯
26B0 ⚰ ⚱ ⚲ ⚳ ⚴ ⚵ ⚶ ⚷ ⚸ ⚹ ⚺ ⚻ ⚼ ⚽ ⚾ ⚿
26C0 ⛀ ⛁ ⛂ ⛃ ⛄ ⛅ ⛆ ⛇ ⛈ ⛉ ⛊ ⛋ ⛌ ⛍ ⛎ ⛏
You should view these in a large font. Many of the symbols can be made
into costume jewellery that will tell the world something about you.
Halfwidth and Fullwidth Forms uFF00–uFFEF
-----
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
ff00 ! " # $ % & ' ( ) * + , - . /
ff10 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
ff20 @ A B C D E F G H I J K L M N O
ff30 P Q R S T U V W X Y Z [ \ ] ^ _
ff40 ` a b c d e f g h i j k l m n o
ff50 p q r s t u v w x y z { | } ~
Sorry, I'm only showing the fullwidth ASCII ones. Until the spammers
catch on, doing this in UTF-8 is a great way to encode your email
address. Here is a Lua translator from standard to wide glyphs,
the latter encoded in UTF-8:
function wide(s)
local function wide(a)
if a<'!' or a>'~' then return a end
if a==' ' then return ' ' end
a = a:byte()+160
if a<256 then return string.char(239,188,a-64) end
return string.char(239,189,a-128)
end
return(s:gsub(".",wide))
end
Visit the FileFormat site for more.
It's a fascinating way to spend a
rainy afternoon. Find out about
"FEMALE OF CHINESE UNICORN" u9E9F!