lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


[top-posted untrimmed-ummarked-quote paragraph-length-line damage
repaired manually]

>> Five Lua test files are actually ISO-8859-1 encoded: [...]

Are they?  Or are they just "some single-octet superset of ASCII"
encoded?  (I don't know where to find those files - nothing with any of
those names appears in the Lua 5.4.4 tree I have on hand - so I can't
easily check.)

>> The real problem is that itâ??s such an increasingly UTF-8 world
>> that many editors donâ??t try to auto-detect the encoding.

The obvious remedy is to stop using editors that broken.  Or, at least,
I would have thought it obvious.  (Or, at the very least, stop doing so
when working with Lua code.)

> I also ran into this problem a few weeks back.  One portable solution is to $ 

Please don't use paragraph-length lines.  (If you want the recipient to
rewrap, MIME provides a way to declare that, and it doesn't involve
paragraph-length lines.)

> [...] multibyte encodings like UTF-8, which I suspect is the default
> encoding for pretty much everyone nowadays on Unix platforms, [...]

Fortunately not everyone.  Maybe in some subcultures....

Actually, I find the blind assumption that X3.64 works more common (and
more annoying) than the blind assumption that UTF-8 works.  Popular
Linux distros are by far the worst offenders for both in my (admittedly
limited) experience.  I even ran into one thing that wedges(!) on
startup if the X3.64 DSR sequence with argument 6 doesn't draw a
response - even with my $TERM set to a type that doesn't have any hint
of X3.64 support.

> Has moving the internal string representation to UTF-8 been
> considered?

Please don't.  I've recently been thrown into Rust at work.  It _does_
do that, it mandates that all strings must be UTF-8, and it breaks an
amazing amount of stuff if you aren't living in an insular little
"UTF-8 is the One True Encoding" world.  (It has byte strings, which
are octet strings rather than character strings, but they are severely
crippled by lack of support for byte-string versions of almost all of
the string support.  They probably think they are being nice by pushing
i18n or some such, but what they end up doing, at least for me, is
pushing _against_ i18n because I end up having to stick to ASCII.)

If you really have to, please recognize that not everyone has drunk the
UTF-8 koolaid and provide octet strings as well as character strings.
First-class octet strings, including string support like formatted
output generation.

I've been trying to teach myself Rust by writing a Rust version of a
small utility I wrote (in C) a while ago.  But I am not willing to, for
example, have it crash if you try to use it to manipulate a file whose
name is not UTF-8.  (Unix filenames are not character strings and never
have been; they are octet strings.  Like any octet strings, they may be
interpreted as character strings if someone feels like it.)  It's quite
impressive how hostile Rust manages to be to non-UTF-8 text.  Please
don't let Lua go there.

> Or tagging strings with the encoding so that they can be converted as
> needed into the appropriate encoding?

If you do, please provide real support for octet strings, either in the
form of a "this string is just octets, not to be transcoded"
pseudo-encoding or in the form of octet strings with, as outlined
above, a full suite of parallel constructs (like formatted I/O).

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B