lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Tue, Sep 2, 2014 at 12:52 PM, Milind Gupta <milind.gupta@gmail.com> wrote:
> Hi,
>        I an trying to get a xml file from a uPnP device and the file I get
> starts off as:
>
> <?xml version="1.0"?>
>
> The 1st three characters should not be there. It seems the content-length in
> the response header is off by 3 so it picks 3 extra characters.
>         If I open the same file in firefox it gets the same response
> (checked with Fiddler) but if I save it Firefox saves the file without those
> 3 characters and makes it a valid XML.
>          Does anyone have any idea how firefox makes its http request more
> robust so it can decide that those first 3 characters do not belong to the
> body of the message? In other words how does it know that the content-length
> is off.
>
> Thanks,
> Milind

Looks like an encoding issue to me -- that looks like a BOM encoded in
UTF-8, interpreted as... I dunno, CP-437 or something, or maybe munged
through a couple charset conversions. (I can't tell through a
copy-paste because that adds a possibility of another couple charset
conversions.) Rendered in proper Unicode, that would be a single
invisible character (a zero-width nonbreaking space, specifically)
indicating that the content that follows is UTF-8.

The XML spec says that a UTF-8 BOM is definitely SUPPOSED to be legal.
If your XML parser can't handle it, then either the device is
generating the BOM incorrectly (it should be 0xEF 0xBB 0xBF), or
you're munging it somewhere between receiving it from the socket and
writing it to a file (possibly implicitly, for example if something
you're using is trying to be too clever and doing a charset conversion
when it shouldn't), or your XML parser is in violation of the spec. I
suspect the first one is the most likely.

/s/ Adam