[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Future plans for Lua and Unicode
- From: Roberto Ierusalimschy <roberto@...>
- Date: Fri, 6 Jul 2012 09:36:51 -0300
> This make me think of a trick that could be useful in some situations.
> I know it is illegal according to UTF-8 specifications...
> But using overlong UTF-8 sequences could be used to *escape* special
> characters in string literals in a unified way !
> Typically, new line, carriage return, tabulation are entered as \n, \r
> and \t respectively. NUL byte and other control characters are written
> in decimal or hexadecimal form as \000 or \x01. And characters " ' and
> \ must often be entered as \", \' and \\.
>
> [...]
> Is this idea completely stupid or has any practical interest ?
Sorry Patrick, but this one I would call "mostly stupid". There are a
number of drawbacks for that approach. Being illegal, most text editors
will reject or silently convert overlong sequences, and do not have
a way to enter such a sequence neither. Other UTF-8 aware software
libraries will also reject overlong sequences. This seriously limit the
number of practical usages ! :)
Moreover, it is not that difficult to add escapes or to use [[...]].
-- Roberto