lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


2015-09-02 21:02 GMT+02:00 Coda Highland <chighland@gmail.com>:
> On Wed, Sep 2, 2015 at 11:58 AM, Roberto Ierusalimschy
> <roberto@inf.puc-rio.br> wrote:
>>> ***Whether an actual application should do this is debatable, as it
>>> makes it impossible to store invalid UTF-16 (that is, UTF-16 with
>>> unpaired surrogate halves) in a UTF-8 string. This is necessary to
>>> store unchecked UTF-16 such as Windows filenames as UTF-8. It is
>>> also incompatible with CESU encoding (described below).***
>>
>> This is the heart of the issue.
>>
>> -- Roberto
>>
>
> I agree here. Lua doesn't purport to offer Unicode support. It only
> purports to offer handling for UTF-8 encoding. Trying to impose
> semantics on top of a straightforward set of accessors is taking steps
> towards doing more than really needs to be provided in the core.
>
> If you really need Unicode support, get a Unicode library. If you
> don't need Unicode support, then why do you care if Lua refuses to
> break when presented with properly-structured but
> semantically-meaningless data?

Actually, I have only just for the first time ever read all of the
Wikipedia page. At the bottom, it says:

WTF-8 (Wobbly Transformation Format − 8-bit) is UTF-8 where the
encodings of the surrogate halves (U+D800 through U+DFFF) are allowed.
This is necessary to store possibly-invalid UTF-16, such as Windows
filenames. The term seems to have come from the Rust programming
language.[31] Many systems that deal with UTF-8 work this way without
considering it a different encoding, as it is simpler. The source code
samples above work this way, for instance.