[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: UTF-8 testing
- From: Eero Pajarre <epajarre@...>
- Date: Thu, 6 Jan 2011 22:29:29 +0200
On Thu, Jan 6, 2011 at 9:57 PM, Henning Diedrich <hd2010@eonblast.com> wrote:
> On 1/6/11 8:34 PM, Eero Pajarre wrote:
>
> }else if (c[0]==0xe4 || /* Caution this part is not UTF-8, you
> should assert here if you just want to be compatible*/
>
> Thanks, yes I cut the quote off early.
>
> Is the above part where you respect Latin?
Yes, it will accept the most common Finnish/Swedish characters
(adieresis, odiereses and a-ring (or what are they called))
> Is your implementation going to yield different results because you 'read
> away' characters in higher than first position in a sequence that might
> otherwise wrongly be counted?
Well, I guess you have to decide yourself what to do with those
characters which are not well formed. I guess you could just count the
well formed characters and ignore the rest.
Eero