[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: What do you miss most in Lua
- From: Tim Mensch <tim-lua-l@...>
- Date: Tue, 07 Feb 2012 16:01:52 -0700
On 2/7/2012 3:11 PM, Egil Hjelmeland wrote:
What about sorting/collating? That would be useful. But is that a
big-table-thing in Unicode?
What language are you sorting for?
In Spanish, "LL" comes after "LZ" in sort order, and "CH" comes after
"CZ", but not in any other language. ("LL" and "CH" are considered
"letters", and so therefore sort differently).
There are many other rules that apply to specific languages and
contexts; a dictionary sort in English is different than an alphanumeric
sort, for instance (a1,a10,a9 compared to a1,a9,a10, respectively).
But the short answer is: Yes, it's one of the things you need a huge
table and supporting source code to completely handle in Unicode. Or an
even bigger table if you REALLY need to do it right in multiple locales.
If you're curious, there's a table generator online where you can create
those Big Tables based on what you need (mapping from other charsets, a
break iterator, collators, rule based number format handling, and more).
[1] Which is good, because ALL of it comes to about 18Mb. Keep in mind
that's JUST the data table size, and not the code needed to parse the
data table, which in one case built to a 900k DLL on Windows (with 300k
of embedded tables) just for generic collation handling (trying to make
collation sane, though not actually correct, for all languages).
More examples of strange collation exceptions and general detail about
collation can be found on unicode.org [2].
Tim
[1] http://apps.icu-project.org/datacustom/
[2] http://www.unicode.org/reports/tr10/