Re: unicode char ranges

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: unicode char ranges
From: Marc Balmer <marc@...>
Date: Tue, 4 Dec 2012 22:51:04 +0100

Am 04.12.2012 um 22:17 schrieb Dirk Laurie <dirk.laurie@gmail.com>:

> 2012/12/4 spir <denis.spir@gmail.com>:
> 
>> But this appears impossible with char ranges, unless I miss a point. In
>> lexicographical order according to unicodes (Unicode code points) [1], a
>> decomposed "ã" would find place between "a" and "b" since its first code is
>> a base 'a' --which for people unfamiliar with Unicode is the same code as
>> the one for a full, simple character "a".
> 
> I'll confess to having only skimmed the rest of your post, so my
> response may be irrelevant, but this is what I do:
> 
> local latin1 = "\195[\128-\191]"
> local unlatin = {
> ["À"]="A",["Á"]="A",["Â"]="A",["Ã"]="A",["Ä"]="A",["Å"]="A",["Æ"]="AE",["Ç"]="C",
> ["È"]="E",["É"]="E",["Ê"]="E",["Ë"]="E",["Ì"]="I",["Í"]="I",["Î"]="I",["Ï"]="I",
> ["Ð"]="D",["Ñ"]="N",["Ò"]="O",["Ó"]="O",["Ô"]="O",["Õ"]="O",["Ö"]="O",["×"]="*",
> ["Ø"]="O",["Ù"]="U",["Ú"]="U",["Û"]="U",["Ü"]="U",["Ý"]="Y",["Þ"]="TH",["ß"]="ss",
> ["à"]="a",["á"]="a",["â"]="a",["ã"]="a",["ä"]="a",["å"]="a",["æ"]="ae",["ç"]="c",
> ["è"]="e",["é"]="e",["ê"]="e",["ë"]="e",["ì"]="i",["í"]="i",["î"]="i",["ï"]="i",
> ["ð"]="d",["ñ"]="n",["ò"]="o",["ó"]="o",["ô"]="o",["õ"]="o",["ö"]="o",["÷"]="/",
> ["ø"]="o",["ù"]="u",["ú"]="u",["û"]="u",["ü"]="u",["ý"]="y",["þ"]="TH",["ÿ"]="ij"
> }
> local function asciize(text) return text:gsub(latin1,unlatin) end
> 
> Then all key generation, indexing, alphabetic sorting etc is done
> on `asciize(str)`.
> 


this is bogus and wrong on so many levels.  there are 109'242 digital glyphs (codepoints) defined and what you offer here is just a poor and limited subset.  totally useless.

Follow-Ups:
- Re: unicode char ranges, Marc Balmer

References:
- unicode char ranges, spir
- Re: unicode char ranges, Dirk Laurie

Prev by Date: Re: Boolean matters
Next by Date: Re: unicode char ranges
Previous by thread: Re: unicode char ranges
Next by thread: Re: unicode char ranges
Index(es):
- Date
- Thread