|
I have to read into it a bit more carefully, but it appears to me that aOn 31.10.2016 at 08:42 Daurnimator wrote:
> On 31 October 2016 at 14:18, Hisham <h@hisham.hm> wrote:
>>
>> An interesting work on checking the compliance and compatibility of
>> JSON parsers:
>>
>> http://seriot.ch/parsing_json.html
>>
>> The author wrote a test suite comparing 34 different JSON parser
>> implementations in several languages (there are two Lua
>> implementations there, Jeffrey Friedl's and lua-dkjson) and found out
>> that there are no two alike.
>>
>> This reminded me of the complaint given a few times about the Lua
>> module ecosystem having "too many JSON parsers" or something like
>> that, so I thought it would be interesting to share it here. :)
>
>
> Slightly worth mentioning is that the author used dkjson incorrectly.
> https://twitter.com/daurnimator/status/ 791494454888673280
lot of the tests are checking something which I didn't even design my
parsers for -- my library cannot be used for validating JSON strings. My
goal was that every valid (UTF-8) JSON string can be parsed.
If I wrote a parser for a language usually written by users, it would be
important that the parser rejects any mistakes -- otherwise the user
would trust the broken code and only see it breaking later when trying
to port to a different system. But for JSON I assumed the data came
mostly from other encoders.
Especially the pure-Lua parser (which appears to have been used in the
tests) accepts really ridiculous input data. The reason is that I tried
to keep the amount of code as small as possible. So it actually uses the
same function for parsing arrays and objects.
The LPeg-version is more strict and would have passed more of the tests.
I am aware that it can be a problem that my library has actually two
parsers that behave differently, although previously I thought that was
mostly related to different error messages returned by the library.
Maybe I should clarify in the description that the library is not a
validator.
I might try to rerun the tests locally to see how many of them are also
problematic in the LPeg-based parser. If that one is more accurate, my
advice would be to always use 'require "dkjson".use_lpeg()' when a
strict parser is required. Of course the trailing garbage would still be
an issue as you described in your Twitter post. But it could be solved
by a wrapper function that checks that no trailing data is left.
The only "red" flags I saw in the test results was the lack of UTF-16
support. Indeed, that is something I intentionally left out, as I had
never seen a UTF-16 JSON in the wild. If I did, I would probably use a
designated Unicode-library for converting it to UTF-8.
Best regards,
David