[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: UTF-8 patterns in Lua 5.3
- From: Tim Hill <drtimhill@...>
- Date: Thu, 17 Apr 2014 10:10:58 -0700
On Apr 17, 2014, at 3:32 AM, Ross Bencina <rossb-lists@audiomulch.com> wrote:
> On 17/04/2014 5:29 PM, steve donovan wrote:
>> On Thu, Apr 17, 2014 at 9:12 AM, Coda Highland<chighland@gmail.com> wrote:
>>> >I consider this to be evidence towards "either implement all of
>>> >Unicode or stay out of the way."
> >
>> Well, that can't be done in the core, since Lua would double in size ;)
>
>
> Is there a known lower bound on the complexity of implementing "all of Unicode"?
>
> Lua does well at keeping things small, maybe "all of Unicode" is not as big as is assumed? (or maybe it is?)
>
> Ross.
>
(shudders) .. it’s huge. First you have all the different encodings, then collating sequences, then various normalized forms (how CAN a form by “normalized” when there are four different ones???), then you have all the complexities of glyphs and graphemes, and curious definitions of “word” and “whitespace” and on and on and on. Linking even the minimal part of ICU bloats Lua by a factor of 4x or more (yes, 400% bigger with Unicode).
—Tim