|
On Fri, Jan 7, 2011 at 9:30 AM, Tony Finch <dot@dotat.at> wrote: > That's incorrect. Codepoints in UTF-8 can be at most 4 octets long. Unicode is defined at 32bit at most (i think), but UTF-8 needs more that 4 octets to encode 32 bits. UTF-8 is defined up to 6 octets (5 'trailing' bytes on this snippet) -- Javier