|
PS: The flag "🏴" is the union of the following unicode code points: 1F3F4 E0067 E0062 E0065 E006E E0067 E007FA + ´ , (2 unicode code points) that ****must***** be presented as "Á" by text editors/text presenters.On Tue, Jul 10, 2018 at 7:20 PM Alysson Cunha <alyssonrpg@gmail.com> wrote:there are 3 entities with unicode strings::1 - The bytes according to the encoding used (UTF-8, UTF-16 Big Endian, UTF-16 Little endian, UTF-32)2 - The unicode code points - The union of one or more bytes compose the code points3 - And the trickest of they, the glyphs. One or more unicode code points compose a single glyph.
Example: This flag "🏴" is composed of 7 unicode code points, these code-points encoded as UTF-8 occupies 14 bytes.A single glyph (the flag) is composed by 7 unicode code points, or 14 UTF-8 bytes.
Many emojis are union of more than 1 code point.... And there are the Composing Code Points .... A + ´ , (2 unicode code points) that my be presented as "Á" by text editors/text presenters.I think utf8.len() returns the quantity of Unicode Code Points, not glyphs...
PS: In Delphi, I made a library myself to handle glyphs, code points and bytes....On Tue, Jul 10, 2018 at 6:56 PM Gregg Reynolds <dev@mobileink.com> wrote:On Tue, Jul 10, 2018, 4:44 PM Gregg Reynolds <dev@mobileink.com> wrote:(e.g. numbers in ltr scripts).Correction: numbers in rtl scripts. Unicode says that numbers in e.g. Arabic are ltr. This is complete BS, but it is also a fact on the ground that cannot be fixed. Extra credit: estimate the cost of this very fundamental mistake.--Alysson Cunha / AlyssonRPG
http://www.rrpg.com.br - Jogue o tradicional RPG de mesa online--Alysson Cunha / AlyssonRPG
http://www.rrpg.com.br - Jogue o tradicional RPG de mesa online