lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Tue, 19 Mar 2019 at 07:25, Roberto Ierusalimschy
<roberto@inf.puc-rio.br> wrote:
>
> >  Roberto> Why is rejecting surrogates a backwards step?
> >
> > Rejecting surrogates is a forward step, that's not the problem.
> >
> > Accepting values over 10FFFF is the backward step.
>
> Did you read the documentation? By default the functions reject any
> value over 10FFFF. They only accept these values if you give an explicit
> parameter for that end. You explicitly says: I want invalid codes.
> That, as others pointed out, may be useful for other purposes.
>
> If you want to accept invalid codes, it is not the lack of this
> parameter that will stop you.
>
> (Again, did you read the documentation? Maybe that point is not
> clear there?)

What I think is a backwards step, is the lexer accepting "\u{110000}"
Unicode escapes >10FFFF should really be an error IMO.

UTF8PATT accepting deprecated 5 and 6 byte sequences is a similarly
undesirable change.

Accepting unpaired surrogates isn't odd, and is unfortunately required
when working with many badly designed APIs (e.g. windows file paths,
javascript). utf-8 with unpaired surrogates allowed is often called
"wtf-8". https://simonsapin.github.io/wtf-8/