Re: Native unicode support?

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Native unicode support?
From: Björn De Meyer <bjorn.demeyer@...>
Date: Wed, 26 Jun 2002 20:32:41 +0200

Edgar Toernig wrote:
> 
> Björn De Meyer wrote:
> > ...
> > supply your own replacements for isalpha() and isalnum().
> > Fortunately, with UTF-8, you can see from a single byte
> > whether a character is part of an "alphabetical" sequence.
> 
> I'm not a UTF-8 expert but I doubt that.  How's that gonna work?
> Afaik, the "alphabetical" characters are spread out around the
> whole charset...
> 
> Ciao, ET.

Well, first of all let me clarify that apart from 
[a-z][A-Z] I would consider any valid character outside 
the 7 bit ANSI range as "alphabetical", or more precisely,
as acceptable for an identifier name. In UTF-8 encoding,
you can see from the current byte wether it belongs to 
the 7-bit range, or to a sequence that encodes for a 
non-ANSI Unicode character.

Basically the utf8_isalpha would need to become:

int utf8_isalpha(int ch)
{
  return 
  ( 
    isalpha(ch) 
    || ((ch >= 0x80) && (ch <= 0xfd)) 
  ); 
}

The bytes 0xfe, and 0xff are invalid in UTF-8, 
so they are the only ones in the non-ASCII 
8-bit range that are not part of the 
encoding of an "identifier name" character. 

-- 
"No one knows true heroes, for they speak not of their greatness." -- 
Daniel Remar.
Björn De Meyer 
bjorn.demeyer@pandora.be

Follow-Ups:
- Re: Native unicode support?, Edgar Toernig
- Re: Native unicode support?, Peter Loveday

References:
- Native unicode support?, Chung Jiho
- Re: Native unicode support?, Björn De Meyer
- Re: Native unicode support?, David Burgess
- Re: Native unicode support?, Björn De Meyer
- Re: Native unicode support?, Edgar Toernig

Prev by Date: Re: equal tag method [was Re: unicode and locale again]
Next by Date: Re: equal tag method [was Re: unicode and locale again]
Previous by thread: Re: Native unicode support?
Next by thread: Re: Native unicode support?
Index(es):
- Date
- Thread