[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: htmlentities table
- From: David Given <dg@...>
- Date: Sat, 29 Oct 2005 00:56:25 +0100
On Friday 28 October 2005 22:13, Rici Lake wrote:
[...]
> The full pattern: [^\128-\191][\128-\191]
[...]
> matches:
>
> "Not a continuation byte" followed by 0 or more "continuation bytes"
Should there be a * on the end of that pattern? Because what you wrote matches
'not a continuation byte' followed by 'exactly one continuation byte'.
But other than that, yes, you're right. I had misremembered the UTF8 spec.
(I'm now trying to figure out what spec I was conflating it with that
consists of a sequence of high-bit characters followed by a non-high-bit
character --- this is going to bother me all night.) Remind me not to talk
about Unicode when I'm low on caffeine; I did a whole set of encoding codecs
a while back, and am still trying to recover...
Incidentally, to anyone who's reading, ignore any of the patterns I gave in my
previous email. They are incorrect.
(Damn, I hate regexps.)
--
+- David Given --McQ-+ "Est brilgum: toui slimici
| dg@cowlark.com | In uabo tererotitant
| (dg@tao-group.com) | Brogoui sunt macresculi
+- www.cowlark.com --+ Momi rasti strugitant." --- Anonymous
Attachment:
pgpjwBhmfueBN.pgp
Description: PGP signature