lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Fri, Feb 20, 2015 at 12:18:53AM +0100, Jan Behrens wrote:
> On Thu, 19 Feb 2015 14:05:46 -0800 William Ahern
> <william@25thandClement.com> wrote:
<snip>
> > Wow. I just read this page
> > 
> > 	http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html
> > 
> > where it is says
> > 
> > 	The character set named utf8 uses a maximum of three bytes per
> > 	character and contains only BMP characters. As of MySQL
> > 5.5.3, the utf8mb4 character set uses a maximum of four bytes per
> > character supports supplemental characters.
> > 
> > 	...
> > 
> > 	For a supplementary character, utf8 cannot store the
> > character at all, while utf8mb4 requires four bytes to store it.
> > Since utf8 cannot store the character at all, you do not have any
> > supplementary characters in utf8 columns and you need not worry about
> > converting characters or losing data when upgrading utf8 data from
> > older versions of MySQL.
<snip>
> 
> Not just an issue of MySQL, I believe.
> 
> See also http://en.wikipedia.org/wiki/CESU-8 and
> Unicode Technical Report #26: http://www.unicode.org/reports/tr26/
> 

MySQL's "utf8" encoding is fundamently broken. CESU-8 is just awkward, not
broken; it can still represent all Unicode code points.