lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I also ran into this problem a few weeks back.  One portable solution is to escape the ISO-8859-1 characters with \x so that the source file encoding can be UTF-8 but the string literals will remain ISO-8859-1.  This keeps the tests passing.

 

This doesn’t solve the problem that the string length, search, replace and manipulation functions don’t work with multibyte encodings like UTF-8, which I suspect is the default encoding for pretty much everyone nowadays on Unix platforms, with other platforms having adopted Unicode well before that.  Has moving the internal string representation to UTF-8 been considered?  Or tagging strings with the encoding so that they can be converted as needed into the appropriate encoding?

 

Kind regards,

Roger

 

 

From: Michael Lenaghan <michaell@dazzit.com>
Sent: Saturday, July 29, 2023 10:42 PM
To: lua-l@lists.lua.org
Subject: Five Lua test files are ISO-8859-1 encoded

 

Hello, all.

 

Five Lua test files are actually ISO-8859-1 encoded:

 

  • db.lua
  • files.lua
  • pm.lua
  • sort.lua
  • strings.lua

 

Two of the files have tests that count bytes, so you can’t just convert them to UTF-8. Well, not if you want your tests to succeed. :-)

 

Not fatal — the tests work as they are! — but unusual in an increasingly UTF-8 world.

 

The real problem is that it’s such an increasingly UTF-8 world that many editors don’t try to auto-detect the encoding. Save any changes in such an editor — hello, VS Code! — and you corrupt the files.