Re: UTF-8 testing

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: UTF-8 testing
From: Eero Pajarre <epajarre@...>
Date: Thu, 6 Jan 2011 22:29:29 +0200

On Thu, Jan 6, 2011 at 9:57 PM, Henning Diedrich <hd2010@eonblast.com> wrote:
> On 1/6/11 8:34 PM, Eero Pajarre wrote:
>
>   }else if (c[0]==0xe4 ||  /* Caution this part is not UTF-8, you
> should assert here if you just want to be compatible*/
>
> Thanks, yes I cut the quote off early.
>
> Is the above part where you respect Latin?

Yes, it will accept the most common Finnish/Swedish characters
(adieresis, odiereses and a-ring (or what are they called))

> Is your implementation going to yield different results because you 'read
> away' characters in higher than first position in a sequence that might
> otherwise wrongly be counted?

Well, I guess you have to decide yourself what to do with those
characters which are not well formed. I guess you could just count the
well formed characters and ignore the rest.


   Eero

References:
- UTF-8 testing, Henning Diedrich
- Re: UTF-8 testing, Eero Pajarre
- Re: UTF-8 testing, Henning Diedrich

Prev by Date: Re: UTF-8 testing
Next by Date: Re: UTF-8 testing
Previous by thread: Re: UTF-8 testing
Next by thread: Re: UTF-8 testing
Index(es):
- Date
- Thread