Re: Lua 5.4.0 beta announcement

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Lua 5.4.0 beta announcement
From: "Soni \"They/Them\" L." <fakedme@...>
Date: Thu, 3 Oct 2019 17:58:43 -0300



On 2019-10-03 11:14 a.m., Philippe Verdy wrote:

The change in utf8 makes it incompatible with standards if it nowaccepts and decodes sequences up to 6 bytes.The only standard UTF-8 version is based on the same definitionuniversally adopted that limits them to the 17 planes up to U+10FFFF.The "original" specification of UTF-8 was only an informative RFC,that was deprecated many years ago and never adopted as a standard.All web standards use the version copublished in the Unicode standardand in the RFC replacing it (which was approved and adopted everywhereelse).I think this is a bad idea... So now applications will have to usetheir own libraries and check a lot of dependencies to make sure theyconform and will treat erroneous data as invalid.The UTF-8 standard is so universal today that it is needed in almostall applications using the web, or filesystems. Including automatedprocesses and systems without their own user interface but used asmiddlewares.Having now to rewrite it is a bad idea, especially for small devices(including iOT).You should have not changed this specification. The (extremeley rare)situations when one application may need such extension should bescoped in their own library or could have used another variant of thelibrary. This change also makes existing libraries trying to parse andvalidate international texts (including the builtin pattern engine oralternate regular expression engines) to have new complications.
Anyway I really suggest that the compiling options for the Lua-5.4"utf8" library allows keeping a setting so that the standard behaviorcan remain in place (i.e. 5-byte and 6-byte sequences, as well astheir associated leading bytes which are invalid in the standard,should be treated like other sequences that the library will recognizeas invalid, such as a valid lead byte not followed by the correctnumber of trail bytes, or trail bytes without any leading lead byte):treating them as invalid is expected. But now if applications have tomake additional checks, this will just slow them down (or leave bugsin them with undetected cases possibly creating security holes thatcan be exploited).
A "secure" compiled version of Lua should have this option set bydefault to keep the utf8 library conforming to the standard. The oldRFC behavior should then not be supported or could be added in anoptional secondary library like "oldutf8" instead of "utf8". But I betthat almost now one will ever want to use that old library that willthen not need to be "builtin" in the engine but provided as anoptional extension and loaded only "on demand" in the code usingexplicit library loads.

utf8 has always accepted all sorts of invalid sequences when matchingstrings using the utf8 pattern.

a well-behaved program should never output invalid UTF-8 from validUTF-8, and you still need to *explicitly* request 31-bit utf8 fordecoding, so nothing has changed there.

but perhaps utf8 should be renamed to varint31? these changes are, afterall, meant to reuse the same code for a small yet useful datainterchange format based on 31-bit varints.

the only concern I have is over existing usage of e.g.utf8.codes(s:gsub(...)). it would probably be beneficial to makeutf8.codes accept a start index before the lax switch, or otherwiseenforce that the lax switch is not a number. (the start index is moreappealing imo.)

Le jeu. 3 oct. 2019 à 12:21, TonyMc <afmcc@btinternet.com<mailto:afmcc@btinternet.com>> a écrit :


    Hi,

    in the recent beta announcement there is a link to the changes at
    http://www.lua.org/work/doc/#changes .

    There is a typo there: coersions should be coercions.

    Thank you for the beta!

    Tony

Follow-Ups:
- Re: Lua 5.4.0 beta announcement, Roberto Ierusalimschy

References:
- Lua 5.4.0 beta announcement, TonyMc
- Re: Lua 5.4.0 beta announcement, Philippe Verdy

Prev by Date: Re: [ANN] Lua 5.4.0 (beta) now available
Next by Date: Re: Lua 5.4.0 (beta) implicit fallthrough warning
Previous by thread: Re: Lua 5.4.0 beta announcement
Next by thread: Re: Lua 5.4.0 beta announcement
Index(es):
- Date
- Thread