[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: UTF-8 testing
- From: Eero Pajarre <epajarre@...>
- Date: Thu, 6 Jan 2011 22:29:29 +0200
On Thu, Jan 6, 2011 at 9:57 PM, Henning Diedrich <firstname.lastname@example.org> wrote:
> On 1/6/11 8:34 PM, Eero Pajarre wrote:
> }else if (c==0xe4 || /* Caution this part is not UTF-8, you
> should assert here if you just want to be compatible*/
> Thanks, yes I cut the quote off early.
> Is the above part where you respect Latin?
Yes, it will accept the most common Finnish/Swedish characters
(adieresis, odiereses and a-ring (or what are they called))
> Is your implementation going to yield different results because you 'read
> away' characters in higher than first position in a sequence that might
> otherwise wrongly be counted?
Well, I guess you have to decide yourself what to do with those
characters which are not well formed. I guess you could just count the
well formed characters and ignore the rest.