lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]





Kind Regards,
Ali Rezvani

https://rzvxa.com

Sent with Proton Mail secure email.

------- Original Message -------
On Sunday, October 22nd, 2023 at 4:45 PM, Mouse <mouse@Rodents-Montreal.ORG> wrote:


> > This way every production code used in Nginx, Cloudflare, every editor plugi$
>
>
> Actually, I would say, please don't write it in Rust - or, if you do,
> make sure you don't buy into Rust's "UTF-8 is the One True Character
> Encoding" dogma. It is hellishly difficult to use Rust for anything
> involving strings without hardwiring a mandate that the strings be
> UTF-8 strings into the result; you end up having to reimplement
> whatever fraction you want of the string support for byte strings. (I
> tried to write a file-manipulation program in Rust and, being unwilling
> to make it fall over on filenames that aren't UTF-8, ended up learning
> more than I wanted to about this.)
>
> Lua treats strings as octet strings rather than mandating a character
> set or encoding - providing, but not imposing, UTF-8 support. It would
> be sad to see such an extension language lose that.
>
> /~\ The ASCII Mouse
> \ / Ribbon Campaign
> X Against HTML mouse@rodents-montreal.org
> / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

It shouldn't be a problem with a transpiler, as it will use the current implementation of the Lua as the runtime, and for transpiling we don't need to be aware of encoding in string letterals as it is irrelevant to the context of something like AST. I'm proposing that Rust should be used for compiling custom language to Lua so we can minimize the compilation, I've used typescript in the production environment and it can take time for compiling large code bases, No matter how much effort is spent on JIT, At the end it can't beat the native code with no garbage collection.
I personally prefer UTF-8 strings for application-wide use as it makes localization easier, I think higher-level languages can afford to have UTF-8 by default but for the Lua use case, I think it's not that important as it can be implemented easily. But it could be nice to have a way for defining UTF-8 strings without libraries, a syntax like this could be nice to see in a typed variant of Lua (with type information at compile-time we can decide to transpile to either raw Lua code or generate extra code for UTF-8 support which could even be inlined if compiler optimization is enabled).
example syntax:
local a = u"UTF-8😊"
local b = u'UTF-8😊'