[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: UTF-8 validation
- From: Jay Carlson <nop@...>
- Date: Wed, 9 Dec 2015 21:29:13 -0500
On 2015-12-09, at 8:50 PM, Coda Highland <email@example.com> wrote:
> On Wed, Dec 9, 2015 at 5:39 PM, Javier Guerra Giraldez
> <firstname.lastname@example.org> wrote:
>> On Wed, Dec 9, 2015 at 7:19 PM, Cezary H. Noweta <email@example.com> wrote:
>>> The simple process [ill-formed] => [well-formed] can be named validation
>> Unfortunately, "validation" is taken in many circles as "verifying the
>> validity of input", without changing said input in any form. And to
>> make it worse, Unicode also defines "normalization" (which would be
>> the right term in most other contexts) as something else.
>> I propose coining the ugly term "validification", meaning "making the
>> (possibly invalid) input valid"
I think there are too many different definitions of safety for that name to be useful. Is there consensus even for "is_utf8(s)"? Roberto didn't seem convinced last time. 
Given a string where is_utf8(s) is false, it might be nice to be able to find the byte offset of the first non-UTF-8 sequence. Then people could write their own make_safe functions based on how they want to respond to syntactically invalid sequences.