lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


2012/12/4 spir <denis.spir@gmail.com>:

> But this appears impossible with char ranges, unless I miss a point. In
> lexicographical order according to unicodes (Unicode code points) [1], a
> decomposed "ã" would find place between "a" and "b" since its first code is
> a base 'a' --which for people unfamiliar with Unicode is the same code as
> the one for a full, simple character "a".

I'll confess to having only skimmed the rest of your post, so my
response may be irrelevant, but this is what I do:

local latin1 = "\195[\128-\191]"
local unlatin = {
["À"]="A",["Á"]="A",["Â"]="A",["Ã"]="A",["Ä"]="A",["Å"]="A",["Æ"]="AE",["Ç"]="C",
["È"]="E",["É"]="E",["Ê"]="E",["Ë"]="E",["Ì"]="I",["Í"]="I",["Î"]="I",["Ï"]="I",
["Ð"]="D",["Ñ"]="N",["Ò"]="O",["Ó"]="O",["Ô"]="O",["Õ"]="O",["Ö"]="O",["×"]="*",
["Ø"]="O",["Ù"]="U",["Ú"]="U",["Û"]="U",["Ü"]="U",["Ý"]="Y",["Þ"]="TH",["ß"]="ss",
["à"]="a",["á"]="a",["â"]="a",["ã"]="a",["ä"]="a",["å"]="a",["æ"]="ae",["ç"]="c",
["è"]="e",["é"]="e",["ê"]="e",["ë"]="e",["ì"]="i",["í"]="i",["î"]="i",["ï"]="i",
["ð"]="d",["ñ"]="n",["ò"]="o",["ó"]="o",["ô"]="o",["õ"]="o",["ö"]="o",["÷"]="/",
["ø"]="o",["ù"]="u",["ú"]="u",["û"]="u",["ü"]="u",["ý"]="y",["þ"]="TH",["ÿ"]="ij"
}
local function asciize(text) return text:gsub(latin1,unlatin) end

Then all key generation, indexing, alphabetic sorting etc is done
on `asciize(str)`.