lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


2012/12/5 Jay Carlson <nop@nop.com>:

> Here's a nickel. Get yourself a real operating system
> (or perhaps just a real MUA).

You're the second poster to make snide remarks at my OS.
Adam called it "crappy".

Actually unnecessary decomposed characters cannot arise
on my system without great inconvenience, so I can't blame
the authors for failing to provide an output mechanism that
uncraps crappy input.

My system composes at keyboard entry level.   I hit Compose,
`a`, and `^`, and a genuine `â` appears, no matter which
program is asking for input.

To produce the second, decomposed, one in my post I had
to remind myself of the Unicode for combining circumflex
by consulting a document I wrote in August 2011 (revised
thanks to the present discussion and appended, helpful
comments welcome).

Dirk
Getting Unicode characters into your document
======

0. Don't assume everybody's system has all the fonts yours has.
    Or mine, for that matter.  It can't even do all the symbols 
    I have included here, e.g. u214F "SYMBOL FOR SAMARITAN SOURCE".

1. A Unicode character typically is 2 bytes long, but will probably
    be entered into your document as UTF-8, an encoding into sequences
    of anywhere from 1 to 6 bytes.  This document does not go into
    the details of that encoding.

2. Many websites.  I get my stuff from 
    <http://www.fileformat.info/info/unicode/block>

3. If your system has a compose key, there probably is a file somewhere
    with `Compose` in its name, listing those characters that you can 
    make and how to make them. On some systems a composed Unicode 
    character is entered into the document, on others a combination
    of characters (see Combining Diacritics, below) that is supposed
    to mean the same.

4. Your system may also have a helper app which, if you've read this
    far, you probably don't like all that much.

5. If all else fails, all up-to-date systems have a way of entering
    the actual four-hexdigit number.  The FileFormat site says how to
    to it on Windows and in HTML, C, Python etc.  On mine (Gnome, i.e.
    standard Ubuntu), it is Ctrl-Shift-U (it shows an underlined `u`) 
    and then either hold Ctrl-Shift in and type the four digits, or 
    let go and type them in followed by Return.  The first method is 
    nicer if and only if you don't mistype any digits.

6. Here are my favourites.  Copy-and-paste them.  There's lots more
   on the FileFormat site.

Latin-1 Supplement u0080–u00FF
-----
        00  01  02  03  04  05  06  07  08  09  0A  0B  0C  0D  0E  0F
    A0       ¡   ¢   £   ¤   ¥   ¦   §   ¨   ©   ª   «   ¬   ­   ®   ¯
    B0   °   ±   ²   ³   ´   µ   ¶   ·   ¸   ¹   º   »   ¼   ½   ¾   ¿
    C0   À   Á   Â   Ã   Ä   Å   Æ   Ç   È   É   Ê   Ë   Ì   Í   Î   Ï
    D0   Ð   Ñ   Ò   Ó   Ô   Õ   Ö   ×   Ø   Ù   Ú   Û   Ü   Ý   Þ   ß
    E0   à   á   â   ã   ä   å   æ   ç   è   é   ê   ë   ì   í   î   ï
    F0   ð   ñ   ò   ó   ô   õ   ö   ÷   ø   ù   ú   û   ü   ý   þ   ÿ

NB: u00A0 is a non-breaking space.  The first 32 characters of this 
block are not shown.  They are non-printing characters with names like
"START OF GUARDED AREA", which to my mind suggests that ordinary folk
shouldn't be using them in e-mails.

Latin Extended-A  u0100–u017F
-----
         00  01  02  03  04  05  06  07  08  09  0A  0B  0C  0D  0E  0F
    100   Ā   ā   Ă   ă   Ą   ą   Ć   ć   Ĉ   ĉ   Ċ   ċ   Č   č   Ď   ď
    110   Đ   đ   Ē   ē   Ĕ   ĕ   Ė   ė   Ę   ę   Ě   ě   Ĝ   ĝ   Ğ   ğ
    120   Ġ   ġ   Ģ   ģ   Ĥ   ĥ   Ħ   ħ   Ĩ   ĩ   Ī   ī   Ĭ   ĭ   Į   į
    130   İ   ı   IJ   ij   Ĵ   ĵ   Ķ   ķ   ĸ   Ĺ   ĺ   Ļ   ļ   Ľ   ľ   Ŀ
    140   ŀ   Ł   ł   Ń   ń   Ņ   ņ   Ň   ň   ʼn   Ŋ   ŋ   Ō   ō   Ŏ   ŏ
    150   Ő   ő   Œ   œ   Ŕ   ŕ   Ŗ   ŗ   Ř   ř   Ś   ś   Ŝ   ŝ   Ş   ş
    160   Š   š   Ţ   ţ   Ť   ť   Ŧ   ŧ   Ũ   ũ   Ū   ū   Ŭ   ŭ   Ů   ů
    170   Ű   ű   Ų   ų   Ŵ   ŵ   Ŷ   ŷ   Ÿ   Ź   ź   Ż   ż   Ž   ž   ſ

See also: Latin Extended-B, u0180–u024F; IPA Extensions, u0250–02AF;
   Latin Extended Additional, u1E00–u1EFF, and more. 

Combining diacritics  u0300–036F
-----
         00  01  02  03  04  05  06  07  08  09  0A  0B  0C  0D  0E  0F
    300   c̀   ć   ĉ   c̃   c̄   c̅   c̆   ċ   c̈   c̉   c̊   c̋   č   c̍   c̎   c̏
    310   c̐   c̑   c̒   c̓   c̔   c̕   c̖   c̗   c̘   c̙   c̚   c̛   c̜   c̝   c̞   c̟
    320   c̠   c̡   c̢   c̣   c̤   c̥   c̦   ç   c̨   c̩   c̪   c̫   c̬   c̭   c̮   c̯
    330   c̰   c̱   c̲   c̳   c̴   c̵   c̶   c̷   c̸   c̹   c̺   c̻   c̼   c̽   c̾   c̿
    340   c̀   ć   c͂   c̓   c̈́   cͅ   c͆   c͇   c͈   c͉   c͊   c͋   c͌   c͍   c͎   c͏
    350   c͐   c͑   c͒   c͓   c͔   c͕   c͖   c͗   c͘   c͙   c͚   c͛   c͜   c͝   c͞   c͟
    360   c͠   c͡   c͢   cͣ   cͤ   cͥ   cͦ   cͧ   cͨ   cͩ   cͪ   cͫ   cͬ   cͭ   cͮ   cͯ

These symbols are shown applied to the letter `c`.  The Unicode symbol in 
question cannot be cut-and-pasted, sorry.  You type a letter and then enter 
the Unicode.  Moreover, remark #0 applies even more strongly here. 

Letterlike Symbols  u02100–u0214F
------
          00  01  02  03  04  05  06  07  08  09  0A  0B  0C  0D  0E  0F
    2100   ℀   ℁   ℂ   ℃   ℄   ℅   ℆   ℇ   ℈   ℉   ℊ   ℋ   ℌ   ℍ   ℎ   ℏ
    2110   ℐ   ℑ   ℒ   ℓ   ℔   ℕ   №   ℗   ℘   ℙ   ℚ   ℛ   ℜ   ℝ   ℞   ℟
    2120   ℠   ℡   ™   ℣   ℤ   ℥   Ω   ℧   ℨ   ℩   K   Å   ℬ   ℭ   ℮   ℯ
    2130   ℰ   ℱ   Ⅎ   ℳ   ℴ   ℵ   ℶ   ℷ   ℸ   ℹ   ℺   ℻   ℼ   ℽ   ℾ   ℿ
    2140   ⅀   ⅁   ⅂   ⅃   ⅄   ⅅ   ⅆ   ⅇ   ⅈ   ⅉ   ⅊   ⅋   ⅌   ⅍   ⅎ   ⅏

Number Forms  u2150–u218F
-----
           00  01  02  03  04  05  06  07  08  09  0A  0B  0C  0D  0E  0F
    2150   ⅐   ⅑   ⅒   ⅓   ⅔   ⅕   ⅖   ⅗   ⅘   ⅙   ⅚   ⅛   ⅜   ⅝   ⅞   ⅟
    2160   Ⅰ   Ⅱ   Ⅲ   Ⅳ   Ⅴ   Ⅵ   Ⅶ   Ⅷ   Ⅸ   Ⅹ   Ⅺ   Ⅻ   Ⅼ   Ⅽ   Ⅾ   Ⅿ
    2170   ⅰ   ⅱ   ⅲ   ⅳ   ⅴ   ⅵ   ⅶ   ⅷ   ⅸ   ⅹ   ⅺ   ⅻ   ⅼ   ⅽ   ⅾ   ⅿ
    2180   ↀ   ↁ   ↂ   Ↄ   ↄ   ↅ   ↆ   ↇ   ↈ   ↉   ↊   ↋   ↌   ↍   ↎   ↏

Arrows  u2190–u21FF
-----
          00  01  02  03  04  05  06  07  08  09  0A  0B  0C  0D  0E  0F
    2190   ←   ↑   →   ↓   ↔   ↕   ↖   ↗   ↘   ↙   ↚   ↛   ↜   ↝   ↞   ↟
    21a0   ↠   ↡   ↢   ↣   ↤   ↥   ↦   ↧   ↨   ↩   ↪   ↫   ↬   ↭   ↮   ↯
    21b0   ↰   ↱   ↲   ↳   ↴   ↵   ↶   ↷   ↸   ↹   ↺   ↻   ↼   ↽   ↾   ↿
    21c0   ⇀   ⇁   ⇂   ⇃   ⇄   ⇅   ⇆   ⇇   ⇈   ⇉   ⇊   ⇋   ⇌   ⇍   ⇎   ⇏
    21d0   ⇐   ⇑   ⇒   ⇓   ⇔   ⇕   ⇖   ⇗   ⇘   ⇙   ⇚   ⇛   ⇜   ⇝   ⇞   ⇟
    21e0   ⇠   ⇡   ⇢   ⇣   ⇤   ⇥   ⇦   ⇧   ⇨   ⇩   ⇪   ⇫   ⇬   ⇭   ⇮   ⇯
    21f0   ⇰   ⇱   ⇲   ⇳   ⇴   ⇵   ⇶   ⇷   ⇸   ⇹   ⇺   ⇻   ⇼   ⇽   ⇾   ⇿

See also: Supplemental Arrows-A, u27F0–u27FF; Supplemental Arrows-B, 
u2900–u297F.

Mathematical Operators
-----
          00  01  02  03  04  05  06  07  08  09  0A  0B  0C  0D  0E  0F
    2200   ∀   ∁   ∂   ∃   ∄   ∅   ∆   ∇   ∈   ∉   ∊   ∋   ∌   ∍   ∎   ∏
    2210   ∐   ∑   −   ∓   ∔   ∕   ∖   ∗   ∘   ∙   √   ∛   ∜   ∝   ∞   ∟
    2220   ∠   ∡   ∢   ∣   ∤   ∥   ∦   ∧   ∨   ∩   ∪   ∫   ∬   ∭   ∮   ∯
    2230   ∰   ∱   ∲   ∳   ∴   ∵   ∶   ∷   ∸   ∹   ∺   ∻   ∼   ∽   ∾   ∿
    2240   ≀   ≁   ≂   ≃   ≄   ≅   ≆   ≇   ≈   ≉   ≊   ≋   ≌   ≍   ≎   ≏
    2250   ≐   ≑   ≒   ≓   ≔   ≕   ≖   ≗   ≘   ≙   ≚   ≛   ≜   ≝   ≞   ≟
    2260   ≠   ≡   ≢   ≣   ≤   ≥   ≦   ≧   ≨   ≩   ≪   ≫   ≬   ≭   ≮   ≯
    2270   ≰   ≱   ≲   ≳   ≴   ≵   ≶   ≷   ≸   ≹   ≺   ≻   ≼   ≽   ≾   ≿
    2280   ⊀   ⊁   ⊂   ⊃   ⊄   ⊅   ⊆   ⊇   ⊈   ⊉   ⊊   ⊋   ⊌   ⊍   ⊎   ⊏
    2290   ⊐   ⊑   ⊒   ⊓   ⊔   ⊕   ⊖   ⊗   ⊘   ⊙   ⊚   ⊛   ⊜   ⊝   ⊞   ⊟
    22a0   ⊠   ⊡   ⊢   ⊣   ⊤   ⊥   ⊦   ⊧   ⊨   ⊩   ⊪   ⊫   ⊬   ⊭   ⊮   ⊯
    22b0   ⊰   ⊱   ⊲   ⊳   ⊴   ⊵   ⊶   ⊷   ⊸   ⊹   ⊺   ⊻   ⊼   ⊽   ⊾   ⊿
    22c0   ⋀   ⋁   ⋂   ⋃   ⋄   ⋅   ⋆   ⋇   ⋈   ⋉   ⋊   ⋋   ⋌   ⋍   ⋎   ⋏
    22d0   ⋐   ⋑   ⋒   ⋓   ⋔   ⋕   ⋖   ⋗   ⋘   ⋙   ⋚   ⋛   ⋜   ⋝   ⋞   ⋟
    22e0   ⋠   ⋡   ⋢   ⋣   ⋤   ⋥   ⋦   ⋧   ⋨   ⋩   ⋪   ⋫   ⋬   ⋭   ⋮   ⋯
    22f0   ⋰   ⋱   ⋲   ⋳   ⋴   ⋵   ⋶   ⋷   ⋸   ⋹   ⋺   ⋻   ⋼   ⋽   ⋾   ⋿

See also: Miscellaneous Technical, u2300–u23FF; Geometric Shapes, u25A0–u25FF;
Miscellaneous Mathematical Symbols-A, u27C0–u27EF, etc. 

Enclosed Alphanumerics  u2460—u24F0
-----
           00  01  02  03  04  05  06  07  08  09  0A  0B  0C  0D  0E  0F
    2460   ①   ②   ③   ④   ⑤   ⑥   ⑦   ⑧   ⑨   ⑩   ⑪   ⑫   ⑬   ⑭   ⑮   ⑯
    2470   ⑰   ⑱   ⑲   ⑳   ⑴   ⑵   ⑶   ⑷   ⑸   ⑹   ⑺   ⑻   ⑼   ⑽   ⑾   ⑿
    2480   ⒀   ⒁   ⒂   ⒃   ⒄   ⒅   ⒆   ⒇   ⒈   ⒉   ⒊   ⒋   ⒌   ⒍   ⒎   ⒏
    2490   ⒐   ⒑   ⒒   ⒓   ⒔   ⒕   ⒖   ⒗   ⒘   ⒙   ⒚   ⒛   ⒜   ⒝   ⒞   ⒟
    24A0   ⒠   ⒡   ⒢   ⒣   ⒤   ⒥   ⒦   ⒧   ⒨   ⒩   ⒪   ⒫   ⒬   ⒭   ⒮   ⒯
    24B0   ⒰   ⒱   ⒲   ⒳   ⒴   ⒵   Ⓐ   Ⓑ   Ⓒ   Ⓓ   Ⓔ   Ⓕ   Ⓖ   Ⓗ   Ⓘ   Ⓙ
    24C0   Ⓚ   Ⓛ   Ⓜ   Ⓝ   Ⓞ   Ⓟ   Ⓠ   Ⓡ   Ⓢ   Ⓣ   Ⓤ   Ⓥ   Ⓦ   Ⓧ   Ⓨ   Ⓩ
    24D0   ⓐ   ⓑ   ⓒ   ⓓ   ⓔ   ⓕ   ⓖ   ⓗ   ⓘ   ⓙ   ⓚ   ⓛ   ⓜ   ⓝ   ⓞ   ⓟ
    24E0   ⓠ   ⓡ   ⓢ   ⓣ   ⓤ   ⓥ   ⓦ   ⓧ   ⓨ   ⓩ   ⓪   ⓫   ⓬   ⓭   ⓮   ⓯
    24F0   ⓰   ⓱   ⓲   ⓳   ⓴   ⓵   ⓶   ⓷   ⓸   ⓹   ⓺   ⓻   ⓼   ⓽   ⓾   ⓿

Miscellaneous Symbols u2600–u26FF 
-----
          00  01  02  03  04  05  06  07  08  09  0A  0B  0C  0D  0E  0F
    2600   ☀   ☁   ☂   ☃   ☄   ★   ☆   ☇   ☈   ☉   ☊   ☋   ☌   ☍   ☎   ☏
    2610   ☐   ☑   ☒   ☓   ☔   ☕   ☖   ☗   ☘   ☙   ☚   ☛   ☜   ☝   ☞   ☟
    2620   ☠   ☡   ☢   ☣   ☤   ☥   ☦   ☧   ☨   ☩   ☪   ☫   ☬   ☭   ☮   ☯
    2630   ☰   ☱   ☲   ☳   ☴   ☵   ☶   ☷   ☸   ☹   ☺   ☻   ☼   ☽   ☾   ☿
    2640   ♀   ♁   ♂   ♃   ♄   ♅   ♆   ♇   ♈   ♉   ♊   ♋   ♌   ♍   ♎   ♏
    2650   ♐   ♑   ♒   ♓   ♔   ♕   ♖   ♗   ♘   ♙   ♚   ♛   ♜   ♝   ♞   ♟
    2660   ♠   ♡   ♢   ♣   ♤   ♥   ♦   ♧   ♨   ♩   ♪   ♫   ♬   ♭   ♮   ♯
    2670   ♰   ♱   ♲   ♳   ♴   ♵   ♶   ♷   ♸   ♹   ♺   ♻   ♼   ♽   ♾   ♿
    2680   ⚀   ⚁   ⚂   ⚃   ⚄   ⚅   ⚆   ⚇   ⚈   ⚉   ⚊   ⚋   ⚌   ⚍   ⚎   ⚏
    2690   ⚐   ⚑   ⚒   ⚓   ⚔   ⚕   ⚖   ⚗   ⚘   ⚙   ⚚   ⚛   ⚜   ⚝   ⚞   ⚟
    26A0   ⚠   ⚡   ⚢   ⚣   ⚤   ⚥   ⚦   ⚧   ⚨   ⚩   ⚪   ⚫   ⚬   ⚭   ⚮   ⚯
    26B0   ⚰   ⚱   ⚲   ⚳   ⚴   ⚵   ⚶   ⚷   ⚸   ⚹   ⚺   ⚻   ⚼   ⚽   ⚾   ⚿
    26C0   ⛀   ⛁   ⛂   ⛃   ⛄   ⛅   ⛆   ⛇   ⛈   ⛉   ⛊   ⛋   ⛌   ⛍   ⛎   ⛏

You should view these in a large font.  Many of the symbols can be made 
into costume jewellery that will tell the world something about you.

Halfwidth and Fullwidth Forms  uFF00–uFFEF
-----
          00  01  02  03  04  05  06  07  08  09  0A  0B  0C  0D  0E  0F
    ff00      !  "  #  $  %  &  '  (  )  *  +  ,  -  .  /
    ff10  0  1  2  3  4  5  6  7  8  9  :  ;  <  =  >  ?
    ff20  @  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O
    ff30  P  Q  R  S  T  U  V  W  X  Y  Z  [  \  ]  ^  _
    ff40  `  a  b  c  d  e  f  g  h  i  j  k  l  m  n  o
    ff50  p  q  r  s  t  u  v  w  x  y  z  {  |  }  ~ 

Sorry, I'm only showing the fullwidth ASCII ones. Until the spammers 
catch on, doing this in UTF-8 is a great way to encode your email 
address.  Here is a Lua translator from standard to wide glyphs,
the latter encoded in UTF-8:

    function wide(s)
       local function wide(a)
	   if a<'!' or a>'~' then return a end
           if a==' ' then return '  ' end 
           a = a:byte()+160
           if a<256 then return string.char(239,188,a-64) end
           return string.char(239,189,a-128)
           end
       return(s:gsub(".",wide))
       end

Visit  the  FileFormat  site  for  more.
It's  a  fascinating  way  to  spend  a 
rainy  afternoon.  Find  out  about 
"FEMALE OF CHINESE UNICORN" u9E9F!