[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Suggestion: handle utf-8 filename in windows
- From: Daurnimator <quae@...>
- Date: Thu, 26 Oct 2017 10:22:35 +1100
On 26 October 2017 at 07:23, Egor Skriptunoff
<egor.skriptunoff@gmail.com> wrote:
> But maybe someday Lua would become UTF-8-only language...
> In this case all Lua functions (including string-, patterns-, file- and
> scriptload- operations) should work only with UTF-8 encoded strings.
>
> This means:
> 1) #str would return number of unicode symbols in UTF-8-encoded string
Why would this ever be useful?
- Usually you need to know number of bytes for storage/network
transmission/etc.
- Sometimes you need to know how many characters
- but often in unicode a character is multiple code points
- knowing number of codepoints is useless
- Sometimes you need to know the width of a character on the screen
- this also has it's own algorithm (even for fixed-width fonts)
> 3) file functions would expect file names being encoded in UTF-8
No file system I know of enforces this.
- On linux paths can contain any bytes except the null byte.
- On windows you can have paths with invalid utf16 (it allows
surrogate halves)
File names should be treated as arbitrary sequences of bytes.