Re: UTF-8 patterns in Lua 5.3

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: UTF-8 patterns in Lua 5.3
From: Tim Hill <drtimhill@...>
Date: Thu, 17 Apr 2014 10:10:58 -0700

On Apr 17, 2014, at 3:32 AM, Ross Bencina <rossb-lists@audiomulch.com> wrote:

> On 17/04/2014 5:29 PM, steve donovan wrote:
>> On Thu, Apr 17, 2014 at 9:12 AM, Coda Highland<chighland@gmail.com>  wrote:
>>> >I consider this to be evidence towards "either implement all of
>>> >Unicode or stay out of the way."
> >
>> Well, that can't be done in the core, since Lua would double in size ;)
> 
> 
> Is there a known lower bound on the complexity of implementing "all of Unicode"?
> 
> Lua does well at keeping things small, maybe "all of Unicode" is not as big as is assumed? (or maybe it is?)
> 
> Ross.
> 

(shudders) .. it’s huge. First you have all the different encodings, then collating sequences, then various normalized forms (how CAN a form by “normalized” when there are four different ones???), then you have all the complexities of glyphs and graphemes, and curious definitions of “word” and “whitespace” and on and on and on. Linking even the minimal part of ICU bloats Lua by a factor of 4x or more (yes, 400% bigger with Unicode).

—Tim

References:
- UTF-8 patterns in Lua 5.3, Hisham
- Re: UTF-8 patterns in Lua 5.3, Keith Matthews
- Re: UTF-8 patterns in Lua 5.3, Hisham
- Re: UTF-8 patterns in Lua 5.3, Dirk Laurie
- Re: UTF-8 patterns in Lua 5.3, Coda Highland
- Re: UTF-8 patterns in Lua 5.3, steve donovan
- Re: UTF-8 patterns in Lua 5.3, Ross Bencina

Prev by Date: Re: UTF-8 patterns in Lua 5.3
Next by Date: Re: Setting the environment for a loaded script
Previous by thread: Re: UTF-8 patterns in Lua 5.3
Next by thread: Re: UTF-8 patterns in Lua 5.3
Index(es):
- Date
- Thread