lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

On Friday 28 October 2005 22:13, Rici Lake wrote:
> The full pattern: [^\128-\191][\128-\191]
> matches:
> "Not a continuation byte" followed by 0 or more "continuation bytes"

Should there be a * on the end of that pattern? Because what you wrote matches 
'not a continuation byte' followed by 'exactly one continuation byte'.

But other than that, yes, you're right. I had misremembered the UTF8 spec. 
(I'm now trying to figure out what spec I was conflating it with that 
consists of a sequence of high-bit characters followed by a non-high-bit 
character --- this is going to bother me all night.) Remind me not to talk 
about Unicode when I'm low on caffeine; I did a whole set of encoding codecs 
a while back, and am still trying to recover...

Incidentally, to anyone who's reading, ignore any of the patterns I gave in my 
previous email. They are incorrect.

(Damn, I hate regexps.)

+- David Given --McQ-+ "Est brilgum: toui slimici
|    | In uabo tererotitant
| ( | Brogoui sunt macresculi
+- --+ Momi rasti strugitant." --- Anonymous

Attachment: pgpQwA9weEZYb.pgp
Description: PGP signature