lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


the facility for putting "arbitrary" number of = signs in long string delimiters is NOT made for escaping arbitrary code from random sources.
It is only a facility for commenting source programs.
Don't abuse it! If you need to espace properly data without huge cost, it's just enough to use a small number of = signs (not more than 5 should be enough) and then escape only the occurences of 5 equal signs by breaking it after the 4th one occuring after a ] and if it's followed by a ]. This is a rare case in actual data.

To encode  "=====",
you'd output [====[=========]

To encode  "====]",
you'd output [====[====]]====]

To encode  "]====]" (this is one of the worst case scenarii),
you'd output [====[]====]====][====[]====]

I.e. close the current delimiter and reopen it immediately. This should create a single string, just like  "ABC====""=ABC".

Allowing string literals (using either short or long delimiters) to be repeated without any operator between them, should be parsed as a single literal value (and you could as well mix the quotation styles between each part, this should not change at all the meaning). No change is needed in the Lua lexer, the only change is in the parser: accept "A" "B" to represent the same literal value as "AB" (the spaces between the two parts could as well be newlines, without needing to use an intermediate + operator which would be evaluated at runtime.

The interest of the "long quotation" marks is that it can occur much less often so the escaping worstcase encoded length is better with delimiters that are 6 character long like here instead of just 1-character long. But you'll realize immediately that the worst case always exists and the its total encoded length is now a bit longer than single character escaping (the increase is only in the first leading and last trainling delimiter, but not in the middle part whose size is doubled by adding the escapes.

But the worst case will most likely just occur less often (it occurs only when encoding source sequences containing ONLY an exact repetition of the trailing delimiter.



Le ven. 14 déc. 2018 à 01:06, Gabriel Bertilson <arboreous.philologist@gmail.com> a écrit :
[====[[====[]====] works; perhaps you meant ]====]? But actually
]====] can be represented too.

The requirement for representing a string as a long string literal is
that you choose a pair of delimiters such that the closing delimiter
only occurs at the end of the combination of string and closing
delimiter. So if 4 is the maximum number of equals signs in a
delimiter, then if the string is ]====], you can use delimiters with 1
to 3 equals signs: [=[]====]]=], [==[]====]]==], [===[]====]]===]. And
you have to represent the string ] with 1 to 4 equals signs, because
with zero equals signs, [[]]], the closing delimiter appears too
early, and a stray ] is left for the parser to choke on.

So a string that cannot be represented as a long string literal, given
that N is the limit, is one containing every closing delimiter with 0
to N equals signs. If the limit is 4, an unrepresentable string would
be ]]]=]]==]]===]]====]. The minimum length of such a string would be
triangular_number(N) + 2*(N+1). If the limit is 1000, the
unrepresentable string would be at least 502502 bytes long if I
calculated it right. (See the formula below.) It would be very unusual
for such a string to occur in a program.

function unrepr_str_size(limit) local triangle = 0 for i = 1, limit do
triangle = triangle + i end return triangle + 2 * (limit + 1) end

— Gabriel

On Thu, Dec 13, 2018 at 4:11 PM Rena <hyperhacker@gmail.com> wrote:
>
> On Thu, Dec 13, 2018, 16:32 David Favro <lua@meta-dynamic.com wrote:
>>
>>
>>
>> On December 13, 2018 9:01:52 PM UTC, Egor Skriptunoff <egor.skriptunoff@gmail.com> wrote:
>> >On Thu, Dec 13, 2018 at 7:15 PM Roberto Ierusalimschy wrote:
>> >> it seems easier to just
>> >> limit the maximum number of '=' in a long bracket. I don't think
>> >people
>> >> will mind a limit of 1000.
>> >
>> >IMO, it's not a good idea.
>> >If this limit is N, then minimal size of non-quotable string is about
>> >0.5*N^2
>>
>> What's a "non-quotable string"?
>>
>> Am I missing something or can't any string be represented as a literal with e.g. double-quote (") as delimiter and appropriate escaping of special characters?  If so, I don't see your definition of "non-quotable", could you elaborate?
>
>
> A string that starts with `[====[`, assuming the limit of `=` in a delimiter were 4.
>