lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Thu, Jan 6, 2011 at 9:57 PM, Henning Diedrich <hd2010@eonblast.com> wrote:
> On 1/6/11 8:34 PM, Eero Pajarre wrote:
>
>   }else if (c[0]==0xe4 ||  /* Caution this part is not UTF-8, you
> should assert here if you just want to be compatible*/
>
> Thanks, yes I cut the quote off early.
>
> Is the above part where you respect Latin?

Yes, it will accept the most common Finnish/Swedish characters
(adieresis, odiereses and a-ring (or what are they called))

> Is your implementation going to yield different results because you 'read
> away' characters in higher than first position in a sequence that might
> otherwise wrongly be counted?

Well, I guess you have to decide yourself what to do with those
characters which are not well formed. I guess you could just count the
well formed characters and ignore the rest.


   Eero