[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Mysterious character "Â" in message archive
- From: "Ashwin Hirschi" <lua-l@...>
- Date: Tue, 22 Nov 2011 07:27:12 +0100
When following some of the very helpful links to the message archive
posted by friendly and experienced people, for example
the character known as 'LATIN CAPITAL LETTER A WITH CIRCUMFLEX'
(U+00C2) often appears. Examining the page source reveals that it is
encoded in HTML as "Â".
I suspect that somewhere in the chain between sending the original email
and viewing its copy through the archive, there's a character recoding
Specifically, the "non-breaking space" character (U+00A0) can easily creep
into emails and other web documents, since as a space character it's
rather hard to spot by the human eye. The character is represented as
(hex) A0 in good old-fashioned Latin-1 and as C2 A0 in UTF-8.
Note the similarity in both encodings and you'll see that a non-breaking
space in UTF-8 encoding can easily be mistaken for a non-breaking space
(as part of a Latin-1 document) preceded by an "odd-looking" C2 byte...
[which in this case is escaped by the mail archive system]
P.S. Don't worry too much about them, though. They're not going to creep
into your source by themselves [;-)].