[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Default UTF-8 encoding in strings
- From: Robert Virding <rvirding@...>
- Date: Mon, 22 Jul 2019 03:21:54 +0200
This is for Lua 5.3.4
When exactly are characters/bytes UTF-8 interpreted in a literal strings? It seems like that when you write a literal string which includes a unicode character then it will be inserted into the string by its UTF-8 encoding. Even if it is small enough to fit in one byte. For example the string "aäb" has the bytes 97, 195, 164, 98 even though the ä character has the value 228 so it could fit in a byte. The same when printing a string if there is a legal UTF-8 sequence then its unicode character will be printed, however, a value of 228 will be printed as ?.
While in both the Lua 5.3 reference manual and in the latest Lua book the examples show this, in no place are the actual rules stated. At least I cannot find them.
As with my last string question I am interested as I am implementing Lua and want it to behave the same way.