lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On Mon, Sep 27, 2010 at 3:16 AM, Pan Shi Zhu <pan.shizhu@gmail.com> wrote:

Lua has no problem supporting utf-8 file without BOM.

According to POSIX standard, you should *not* add bom to utf-8 file.

So utf-8 with BOM is not a standard file format.

BTW: gnu gcc does not support utf-8+bom source file either.



Unfortunately,  MS insists on being completely inconsistent, adding the BOM in some tools, and stripping it in others. Complete nightmare.

I once added this to lauxlib.c function luaL_loadfile():

--- lauxlib-orig.c    Mon Sep 27 10:15:59 2010
+++ lauxlib.c    Mon Sep 27 10:16:28 2010
@@ -565,6 +565,21 @@
     if (lf.f == NULL) return errfile(L, "open", fnameindex);
   }
   c = getc(lf.f);
+
+  /* vvv RTR vvv: Check for UTF-8 BOM ef bb bf */
+  if (c == 0xef) {
+    if (getc(lf.f) == 0xbb && getc(lf.f) == 0xbf) {
+      /* do nothing, we've skipped the BOM and just continue with normal processing */
+    } else {
+     /* wasn't the UTF8 BOM, so reset everything again */
+      fclose(lf.f);
+      lf.f = fopen(filename, "r");  /* reopen */
+      if (lf.f == NULL) return errfile(L, "open", fnameindex); /* unable to reopen file */
+    }
+    c = getc(lf.f);
+  }
+  /* ^^^ RTR ^^^: Check for UTF-8 BOM ef bb bf */
+
   if (c == '#') {  /* Unix exec. file? */
     lf.extraline = 1;
     while ((c = getc(lf.f)) != EOF && c != '\n') ;  /* skip first line */


It's been good enough for me for a while.

Robby