lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


When following some of the very helpful links to the message archive
posted by friendly and experienced people, for example
   http://lua-users.org/lists/lua-l/2010-09/msg01062.html
the character known as 'LATIN CAPITAL LETTER A WITH CIRCUMFLEX'
(U+00C2) often appears.  Examining the page source reveals that it is
encoded in HTML as  "Â".
I suspect that somewhere in the chain between sending the original email  
and viewing its copy through the archive, there's a character recoding  
error somewhere.
Specifically, the "non-breaking space" character (U+00A0) can easily creep  
into emails and other web documents, since as a space character it's  
rather hard to spot by the human eye. The character is represented as  
(hex) A0 in good old-fashioned Latin-1 and as C2 A0 in UTF-8.
Note the similarity in both encodings and you'll see that a non-breaking  
space in UTF-8 encoding can easily be mistaken for a non-breaking space  
(as part of a Latin-1 document) preceded by an "odd-looking" C2 byte...  
[which in this case is escaped by the mail archive system]
Ashwin.

P.S. Don't worry too much about them, though. They're not going to creep into your source by themselves [;-)].