[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Issues: Character 160 - Non-breaking space + Additional Issue with UTF-8
- From: Alysson Cunha <alyssonrpg@...>
- Date: Sat, 7 Jul 2018 14:01:39 -0300
> Yeah, and using said database is quite difficult. Have you eer tried> to implement something with it ( not using a lib which does it,> implement the lib ).
Yes. I use Delphi and I make myself a Unicode library for Delphi that uses Unicode Data.
On Sat, Jul 7, 2018 at 5:11 PM, Alysson Cunha <firstname.lastname@example.org> wrote:
>>>> However the correct space character is 0x20 (32).
> This is what I am telling.. What? Who said that 0x20 is the correct space
> character? Answer: ASCII
Lua designers decide what characters are correct space. They (
correctly, IMO ), decided non-breaking-space is not one of them.
> But in Unicode, we have more than 1 "correct space character", because it is
> Unicode, not ASCII... So, current LUA version does not support unicode
If your definition of "correct space char" is so narrow as "being
usable as space separator in lua", you have more than one. I think at
least tabs work too.
And, also, 0xA0 is not a new unicode stuff. It's present in latin-1
and many other iso8859 ( 1 byte per char ) encodings.
It works in unicode because the first 0x100 code points are the same
> By miracle, if you do not use the "wrong" unicode characters, LUA accept it,
> because UNICODE was made to be backward compatible with ASCII till some
With latin-1, and, also, utf-8 ( which is a byte encoding ) encodes
the first 0x80 chars the same as ASCII. ( but it does not encode the
second half of latin-1 the same as the usual 1 byte latin1 encoding ).
> Note: Using the public unicode character database it's easy to handle all
> white space characters of unicode.
Yeah, and using said database is quite difficult. Have you eer tried
to implement something with it ( not using a lib which does it,
implement the lib ).
And, anyway, it seems nobody has problems with 0xA0 not being defined
as a space in lua ( either when using utf-8, latin-* or win1252 or