[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: How do I make sure that a string is compatible with JSON (utf-8/16/32)?
- From: Coda Highland <chighland@...>
- Date: Fri, 27 Sep 2013 08:22:20 -0700
On Fri, Sep 27, 2013 at 5:47 AM, D. Matt Placek <atomicsuntan@gmail.com> wrote:
>
> On Fri, Sep 27, 2013 at 10:55 AM, Enrique Garcia Cota <kikito@gmail.com>
> wrote:
>>
>> Hello there,
>>
>> In my current setup I'm treating some strings in Lua and then storing them
>> in JSON.
>>
>> JSON expects strings in either UTF-8, UTF-16, UTF-32, in big endian or
>> little-endian. Binary blobs outside that is considered invalid.
>>
>> Unfortunately, some of the data I'm receiving can be binary. I need to
>> detect those cases and escape the binary data somehow (probably with Base64
>> encoding).
>
>
> The JSON spec (RFC4627) says: "All Unicode characters may be placed within
> the quotation marks except for the characters that must be escaped:
> quotation mark, reverse solidus, and the control characters (U+0000 through
> U+001F)."
>
> I use a very simple JSON encoder that just scans the string character by
> character and substitutes the correct escape sequence whenever one of these
> characters is encountered. I don't think you need to resort to Base64 or
> other binary encodings unless you really want to.
Two major problems here:
(1) Not every value is a valid Unicode character. There are several
ranges defined as illegal, for various reasons.
(2) Whether 8, 16, or 32 bit, not every byte sequence is a legal UTF
representation.
Less significant but still relevant is that if you use UTF-16 or
UTF-32 you have to deal with formatting the entire string as such, not
just the binary content, and Lua is geared to work with 8-bit strings.
(Not that it's impossible to do otherwise, but it's a substantial
amount of extra work.)
/s/ Adam