lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



When following some of the very helpful links to the message archive
posted by friendly and experienced people, for example
   http://lua-users.org/lists/lua-l/2010-09/msg01062.html
the character known as 'LATIN CAPITAL LETTER A WITH CIRCUMFLEX'
(U+00C2) often appears.  Examining the page source reveals that it is
encoded in HTML as  "Â".

I suspect that somewhere in the chain between sending the original email and viewing its copy through the archive, there's a character recoding error somewhere.

Specifically, the "non-breaking space" character (U+00A0) can easily creep into emails and other web documents, since as a space character it's rather hard to spot by the human eye. The character is represented as (hex) A0 in good old-fashioned Latin-1 and as C2 A0 in UTF-8.

Note the similarity in both encodings and you'll see that a non-breaking space in UTF-8 encoding can easily be mistaken for a non-breaking space (as part of a Latin-1 document) preceded by an "odd-looking" C2 byte... [which in this case is escaped by the mail archive system]

Ashwin.

P.S. Don't worry too much about them, though. They're not going to creep into your source by themselves [;-)].