[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Native unicode support?
- From: Björn De Meyer <bjorn.demeyer@...>
- Date: Wed, 26 Jun 2002 20:32:41 +0200
Edgar Toernig wrote:
>
> Björn De Meyer wrote:
> > ...
> > supply your own replacements for isalpha() and isalnum().
> > Fortunately, with UTF-8, you can see from a single byte
> > whether a character is part of an "alphabetical" sequence.
>
> I'm not a UTF-8 expert but I doubt that. How's that gonna work?
> Afaik, the "alphabetical" characters are spread out around the
> whole charset...
>
> Ciao, ET.
Well, first of all let me clarify that apart from
[a-z][A-Z] I would consider any valid character outside
the 7 bit ANSI range as "alphabetical", or more precisely,
as acceptable for an identifier name. In UTF-8 encoding,
you can see from the current byte wether it belongs to
the 7-bit range, or to a sequence that encodes for a
non-ANSI Unicode character.
Basically the utf8_isalpha would need to become:
int utf8_isalpha(int ch)
{
return
(
isalpha(ch)
|| ((ch >= 0x80) && (ch <= 0xfd))
);
}
The bytes 0xfe, and 0xff are invalid in UTF-8,
so they are the only ones in the non-ASCII
8-bit range that are not part of the
encoding of an "identifier name" character.
--
"No one knows true heroes, for they speak not of their greatness." --
Daniel Remar.
Björn De Meyer
bjorn.demeyer@pandora.be