Re: Lua utf8.len violates RFC 3629? (was Re: [PATCH] Quoted String "%q" non-ascii escaping (w/ hex).)

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Lua utf8.len violates RFC 3629? (was Re: [PATCH] Quoted String "%q" non-ascii escaping (w/ hex).)
From: Ricardo Ramos Massaro <ricardo.massaro@...>
Date: Fri, 30 Jun 2017 22:59:20 -0300

On Fri, Jun 30, 2017 at 2:44 PM, Jay Carlson <nop@nop.com> wrote:
> u{"astral char", "\xEF\xBB\xBF\xF0\xA3\x8E\xB4",
>   expect={1}, rfc=true}
>
> MUST from RFC 3629:
> astral char     2
> expected        1

These tests are nice examples of where Lua's utf8.len() diverges from
the RFC, but the last one confuses me.

It looks like that byte sequence encodes two code points: U+FEFF and
U+233B4. Do you mean to say that utf8.len() should not count U+FEFF
because it appears at the start of the string (and so should be
considered a BOM)? That doesn't look like it's mandated by the RFC,
and I don't think would be a desired behavior for utf8.len().

- Ricardo

Follow-Ups:
- Re: Lua utf8.len violates RFC 3629? (was Re: [PATCH] Quoted String "%q" non-ascii escaping (w/ hex).), Jay Carlson

Prev by Date: Re: [PATCH] Quoted String "%q" non-ascii escaping (w/ hex).
Next by Date: Fun with serializing Lua functions (was Re: Sharing userdata among stats.)
Previous by thread: Re: [PATCH] Quoted String "%q" non-ascii escaping (w/ hex).
Next by thread: Re: Lua utf8.len violates RFC 3629? (was Re: [PATCH] Quoted String "%q" non-ascii escaping (w/ hex).)
Index(es):
- Date
- Thread