lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 27 September 2010 19:19, Robert Raschke <rtrlists@googlemail.com> wrote:
>
> On Mon, Sep 27, 2010 at 3:16 AM, Pan Shi Zhu <pan.shizhu@gmail.com> wrote:
>>
>> Lua has no problem supporting utf-8 file without BOM.
>>
>> According to POSIX standard, you should *not* add bom to utf-8 file.
>>
>> So utf-8 with BOM is not a standard file format.
>>
>> BTW: gnu gcc does not support utf-8+bom source file either.
>>
>
> Unfortunately,  MS insists on being completely inconsistent, adding the BOM
> in some tools, and stripping it in others. Complete nightmare.
>
> I once added this to lauxlib.c function luaL_loadfile():
>
> --- lauxlib-orig.c    Mon Sep 27 10:15:59 2010
> +++ lauxlib.c    Mon Sep 27 10:16:28 2010
> @@ -565,6 +565,21 @@
>      if (lf.f == NULL) return errfile(L, "open", fnameindex);
>    }
>    c = getc(lf.f);
> +
> +  /* vvv RTR vvv: Check for UTF-8 BOM ef bb bf */
> +  if (c == 0xef) {
> +    if (getc(lf.f) == 0xbb && getc(lf.f) == 0xbf) {
> +      /* do nothing, we've skipped the BOM and just continue with normal
> processing */
> +    } else {
> +     /* wasn't the UTF8 BOM, so reset everything again */
> +      fclose(lf.f);
> +      lf.f = fopen(filename, "r");  /* reopen */
> +      if (lf.f == NULL) return errfile(L, "open", fnameindex); /* unable to
> reopen file */
> +    }
> +    c = getc(lf.f);
> +  }
> +  /* ^^^ RTR ^^^: Check for UTF-8 BOM ef bb bf */
> +
>    if (c == '#') {  /* Unix exec. file? */
>      lf.extraline = 1;
>      while ((c = getc(lf.f)) != EOF && c != '\n') ;  /* skip first line */
>
>
> It's been good enough for me for a while.
>
> Robby
>
>

why not use ungetc?