[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: LPEG - next version
- From: David Given <dg@...>
- Date: Fri, 12 Jun 2009 10:02:35 +0100
Miles Bader wrote:
It seems there needs to be a clear distinction between "raw char" (given
that lpeg is quite usable for binary data) and "unicode char".
The problem is that Unicode doesn't really have any such concept as a
'character', which means that traditional string handling methods
basically don't work with it (even if you ignore UTF-8 encoding). A
single displayable thing can actually be made up of several Unicode code
points, and may even have several different (but technically equivalent)
I'm afraid it's just a fundamentally hard problem, and I haven't seen
any decent abstractions over it yet.
Making P(x) count utf8 chars would certainly be convenient for people
reading utf8 files, but... it doesn't seem the cleanest thing in
*Nothing* about Unicode is clean...