[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8
- From: Sean Conner <sean@...>
- Date: Sat, 7 Jul 2018 17:12:23 -0400
It was thus said that the Great Alysson Cunha once stated:
> >>> However the correct space character is 0x20 (32).
>
> This is what I am telling.. What? Who said that 0x20 is the correct space
> character? Answer: ASCII
>
> But in Unicode, we have more than 1 "correct space character", because it
> is Unicode, not ASCII... So, current LUA version does not support unicode
> characters.
There are more than one space character, but some would choose to
interpret them by their intent and not necessarily the same as ASCII SP.
ASCII defined the following characters:
FS 0x1C File Separator
GS 0x1D Group Separator
RS 0x1E Record Separator
US 0x1F Unit Separator
SP 0x20 Space
The placement of Space isn't accidental---the creators of ASCII were very
deliberate in their choices and Space is neither a control character nor a
graphic character, but it can act as either one. One can choose to treat
Space as another separator character with a sie smaller than a "unit". Or
it can be treated as a (non-visible) graphic character. So breaking on a
space is a valid interpretation here.
In Unicode (and some other character sets) there is the concept of a
"non-breaking space". The semantic meaning of this is "this is a non-grapic
character that is part of a sequence of characters prior and after it" and
thus, no splitting should be allowed.
I was able to use such semantics for filenames [1]. I use the command
line almost exclusively, and spaces (0x20) in filenames have always been
problematic because of the way the shell tokenizes input (breaks on Space).
But by replacing Space with Non-breaking-Space, everything just worked, even
filename completion.
So I could argue that Lua did "The Right Thing" when it encounted a
Non-breaking-Space [2]. And I'm even going to go into the semantics of a
Zero-Width-Space [3].
-spc
[1] http://boston.conman.org/2018/02/28.2
[2] The point might be clearer if Lua also supported Unicode in
identifiers.
[3] But if you want to, https://en.m.wikipedia.org/wiki/Zero-width_space