lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Thu, 19 Feb 2015 14:05:46 -0800
William Ahern <william@25thandClement.com> wrote:

> On Thu, Feb 19, 2015 at 04:05:46PM -0500, Daurnimator wrote:
> <snip>
> > So, I probably would not use your library unless you put quite a
> > lot of effort in; This would include things like non-blocking
> > forms, and consistent encodings (e.g. MySQL you need utf8mb4).
> 
> Wow. I just read this page
> 
> 	http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html
> 
> where it is says
> 
> 	The character set named utf8 uses a maximum of three bytes per
> 	character and contains only BMP characters. As of MySQL
> 5.5.3, the utf8mb4 character set uses a maximum of four bytes per
> character supports supplemental characters.
> 
> 	...
> 
> 	For a supplementary character, utf8 cannot store the
> character at all, while utf8mb4 requires four bytes to store it.
> Since utf8 cannot store the character at all, you do not have any
> supplementary characters in utf8 columns and you need not worry about
> converting characters or losing data when upgrading utf8 data from
> older versions of MySQL.
> 
> That's horrendous. I didn't think my opinion about the quality of
> MySQL could sink any lower. They can't even do the honorable thing
> and explicitly say (rather than leave it implied) that the problem
> isn't with UTF-8, but "utf8", a bastardized encoding with a
> confusingly similar name.
> 

Not just an issue of MySQL, I believe.

See also http://en.wikipedia.org/wiki/CESU-8 and
Unicode Technical Report #26: http://www.unicode.org/reports/tr26/

Kind Regards,
Jan


-- 
Public Software Group e. V.
Johannisstr. 12, 10117 Berlin, Germany

www.public-software-group.org
vorstand at public-software-group.org

eingetragen in das Vereinregister
des Amtsgerichtes Charlottenburg
Registernummer: VR 28873 B

Vorstände (einzelvertretungsberechtigt):
Jan Behrens
Axel Kistner
Andreas Nitsche
Björn Swierczek