[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Database connectivity
- From: William Ahern <william@...>
- Date: Thu, 19 Feb 2015 16:14:59 -0800
On Fri, Feb 20, 2015 at 12:18:53AM +0100, Jan Behrens wrote:
> On Thu, 19 Feb 2015 14:05:46 -0800 William Ahern
> <william@25thandClement.com> wrote:
<snip>
> > Wow. I just read this page
> >
> > http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html
> >
> > where it is says
> >
> > The character set named utf8 uses a maximum of three bytes per
> > character and contains only BMP characters. As of MySQL
> > 5.5.3, the utf8mb4 character set uses a maximum of four bytes per
> > character supports supplemental characters.
> >
> > ...
> >
> > For a supplementary character, utf8 cannot store the
> > character at all, while utf8mb4 requires four bytes to store it.
> > Since utf8 cannot store the character at all, you do not have any
> > supplementary characters in utf8 columns and you need not worry about
> > converting characters or losing data when upgrading utf8 data from
> > older versions of MySQL.
<snip>
>
> Not just an issue of MySQL, I believe.
>
> See also http://en.wikipedia.org/wiki/CESU-8 and
> Unicode Technical Report #26: http://www.unicode.org/reports/tr26/
>
MySQL's "utf8" encoding is fundamently broken. CESU-8 is just awkward, not
broken; it can still represent all Unicode code points.