Re: String indexing again

Subject: Re: String indexing again
From: Axel Kittenberger &lt;axkibe@ ... &gt;
Date: Mon, 20 Dec 2010 17:20:22 +0100

The issue with UTF8, which has become the defacto unicode standard pick, that chars have variable with. Something classic C did not see coming, and it causes all kind of confusions clashing with the string is an array of chars notion. You are right classic C does theoretically not define the width of char, other than its fixed on a system. However, so much code supposes it to be an octet, no sane compiler will change that. I don't follow C standards, but I recall some recent gave in on the defacto unchangeable octetness of char and made it standard, but don't quote me on it.

Am 2010 12 20 17:05 schrieb "Greg Falcon" <veloso@verylowsodium.com>:
> Your point about multibyte characters is well taken, but:
>
> On Sun, Dec 19, 2010 at 5:32 PM, Tony Finch <dot@dotat.at> wrote:
>> On 19 Dec 2010, at 22:19, Greg Falcon <veloso@verylowsodium.com> wrote:
>>>
>>> A subtle point here: This snippet from the manual is talking about
>>> the *character* at s[1], and Lua doesn't have a character type.
>>
>> It says character but it means octet.
>
> It probably means character in the C "char" sense. "Octet" is not an
> appropriate word to use for this concept in portable C programs, since
> chars/bytes in standard C are allowed to be wider than 8 bits in
> standard-conforming implementations.
>
> Greg F
>