Re: Matching multibyte alphabetical characters with LPeG

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Matching multibyte alphabetical characters with LPeG
From: William Ahern <william@...>
Date: Sun, 17 Jun 2012 21:54:31 -0700

On Sun, Jun 17, 2012 at 03:52:49PM -0400, Jay Carlson wrote:
> 0	14012	0	0	14012	36bc
> 
> No, it does not provide enough to write a bidi renderer, but it does characterize each code point as one of 30 classes--and includes toupper/tolower/totitlecase.
> 
> http://files.luaforge.net/releases/sln/slnunicode
> 
> There's still the grapheme problem for å vs å; hopefully you can't tell
> the second is "a".."␣̊". [1]
> 
> How should lpeg match the one with a separate combining mark version
> against character classes?

Normalization. Wrap the lpeg API with routines to normalize input strings.
Without normalization Unicode is almost useless, like comparing apples and
oranges while the user sees plums.

I don't see any normalization routines in slnunicode, though.

Follow-Ups:
- Re: Matching multibyte alphabetical characters with LPeG, Miles Bader

References:
- Matching multibyte alphabetical characters with LPeG, Hinrik Örn Sigurðsson
- Re: Matching multibyte alphabetical characters with LPeG, Miles Bader
- Re: Matching multibyte alphabetical characters with LPeG, Jay Carlson

Prev by Date: Re: Matching multibyte alphabetical characters with LPeG
Next by Date: Re: Matching multibyte alphabetical characters with LPeG
Previous by thread: Re: Matching multibyte alphabetical characters with LPeG
Next by thread: Re: Matching multibyte alphabetical characters with LPeG
Index(es):
- Date
- Thread