On Mon, Sep 27, 2010 at 3:16 AM, Pan Shi Zhu
<pan.shizhu@gmail.com> wrote:
Lua has no problem supporting utf-8 file without BOM.
According to POSIX standard, you should *not* add bom to utf-8 file.
So utf-8 with BOM is not a standard file format.
BTW: gnu gcc does not support utf-8+bom source file either.
Unfortunately, MS insists on being completely inconsistent, adding the BOM in some tools, and stripping it in others. Complete nightmare.
I once added this to lauxlib.c function luaL_loadfile():
--- lauxlib-orig.c Mon Sep 27 10:15:59 2010
+++ lauxlib.c Mon Sep 27 10:16:28 2010
@@ -565,6 +565,21 @@
if (lf.f == NULL) return errfile(L, "open", fnameindex);
}
c = getc(lf.f);
+
+ /* vvv RTR vvv: Check for UTF-8 BOM ef bb bf */
+ if (c == 0xef) {
+ if (getc(lf.f) == 0xbb && getc(lf.f) == 0xbf) {
+ /* do nothing, we've skipped the BOM and just continue with normal processing */
+ } else {
+ /* wasn't the UTF8 BOM, so reset everything again */
+ fclose(lf.f);
+ lf.f = fopen(filename, "r"); /* reopen */
+ if (lf.f == NULL) return errfile(L, "open", fnameindex); /* unable to reopen file */
+ }
+ c = getc(lf.f);
+ }
+ /* ^^^ RTR ^^^: Check for UTF-8 BOM ef bb bf */
+
if (c == '#') { /* Unix exec. file? */
lf.extraline = 1;
while ((c = getc(lf.f)) != EOF && c != '\n') ; /* skip first line */
It's been good enough for me for a while.
Robby