lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

It was thus said that the Great Alysson Cunha once stated:
>  >>> However the correct space character is 0x20 (32).
> This is what I am telling.. What? Who said that 0x20 is the correct space
> character? Answer: ASCII
> But in Unicode, we have more than 1 "correct space character", because it
> is Unicode, not ASCII... So, current LUA version does not support unicode
> characters.

  There are more than one space character, but some would choose to
interpret them by their intent and not necessarily the same as ASCII SP.

  ASCII defined the following characters:

	FS	0x1C	File Separator
	GS	0x1D	Group Separator
	RS	0x1E	Record Separator
	US	0x1F	Unit Separator
	SP	0x20	Space

  The placement of Space isn't accidental---the creators of ASCII were very
deliberate in their choices and Space is neither a control character nor a
graphic character, but it can act as either one.  One can choose to treat
Space as another separator character with a sie smaller than a "unit".  Or
it can be treated as a (non-visible) graphic character.  So breaking on a
space is a valid interpretation here.

  In Unicode (and some other character sets) there is the concept of a
"non-breaking space".  The semantic meaning of this is "this is a non-grapic
character that is part of a sequence of characters prior and after it" and
thus, no splitting should be allowed.

  I was able to use such semantics for filenames [1].  I use the command
line almost exclusively, and spaces (0x20) in filenames have always been
problematic because of the way the shell tokenizes input (breaks on Space). 
But by replacing Space with Non-breaking-Space, everything just worked, even
filename completion.  

  So I could argue that Lua did "The Right Thing" when it encounted a
Non-breaking-Space [2].  And I'm even going to go into the semantics of a
Zero-Width-Space [3].



[2]	The point might be clearer if Lua also supported Unicode in

[3]	But if you want to,