lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 26 October 2017 at 07:23, Egor Skriptunoff
<egor.skriptunoff@gmail.com> wrote:
> But maybe someday Lua would become UTF-8-only language...
> In this case all Lua functions (including string-, patterns-, file- and
> scriptload- operations) should work only with UTF-8 encoded strings.
>
> This means:
> 1) #str would return number of unicode symbols in UTF-8-encoded string

Why would this ever be useful?

  - Usually you need to know number of bytes for storage/network
transmission/etc.
  - Sometimes you need to know how many characters
      - but often in unicode a character is multiple code points
      - knowing number of codepoints is useless
  - Sometimes you need to know the width of a character on the screen
      - this also has it's own algorithm (even for fixed-width fonts)

> 3) file functions would expect file names being encoded in UTF-8

No file system I know of enforces this.
  - On linux paths can contain any bytes except the null byte.
  - On windows you can have paths with invalid utf16 (it allows
surrogate halves)

File names should be treated as arbitrary sequences of bytes.