On Friday 28 October 2005 22:13, Rici Lake wrote:
> The full pattern: [^\128-\191][\128-\191]
> matches:
> "Not a continuation byte" followed by 0 or more "continuation bytes"

Should there be a * on the end of that pattern? Because what you wrote matches 
'not a continuation byte' followed by 'exactly one continuation byte'.

But other than that, yes, you're right. I had misremembered the UTF8 spec. 
(I'm now trying to figure out what spec I was conflating it with that 
consists of a sequence of high-bit characters followed by a non-high-bit 
character --- this is going to bother me all night.) Remind me not to talk 
about Unicode when I'm low on caffeine; I did a whole set of encoding codecs 
a while back, and am still trying to recover...

Incidentally, to anyone who's reading, ignore any of the patterns I gave in my 
previous email. They are incorrect.

(Damn, I hate regexps.)

