lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Tue, Aug 9, 2016 at 1:34 PM, Roberto Ierusalimschy
<roberto@inf.puc-rio.br> wrote:
>> On 8 August 2016 at 05:28, Coda Highland <chighland@gmail.com> wrote:
>>
>> >
>> > The documentation is accurate. It's done by luaL_loadfilex and not
>> > part of the more general chunk-loading functionality -- you can't
>> > write load("#!/foo\nprint 'hi'") in Lua code and have it work.
>> >
>> > /s/ Adam
>> >
>>
>> Would it be safe to do it when I load a file but not when I load a string?
>> Would this fit in with how users assume/expect it to work with Lua?
>
> Well, it is safe in the sense that this is how Lua works :-)
>
> Ideally, both the shebang and BOM should be handled by lua.c (the
> stand-alone interpreter), not by any library function. But it
> is difficult to implement them apart from the rest of loadfile
> functionality, so we put them inside 'luaL_loadfilex'.
>
> In particular, we did not document the skipping of BOM because that
> sounds like a hack for us. BOMs in ascii or utf-8 documents is a bug,
> period. Given that this bug is so prevalent, we decided to handle it,
> but it is a kind of "implementation detail". Documenting it would sound
> like a recognition that utf-8 BOMs could have some place in a reasonable
> world.
>
> -- Roberto

BOMs in UTF-8 are permitted by the spec, but they are neither required
nor recommended. If U+FEFF is present at the beginning of any untagged
Unicode stream (including UTF-8) a compliant implementation MUST
consider it a BOM and skip it instead of including in the text stream
as a zero-width nonbreaking space. However, a leading BOM is
PROHIBITED from being included in strongly-typed data where the
encoding is explicitly defined, such as a database field.

As such, since text files on disk in most current filesystems do not
carry encoding metadata, it isn't a bug to put a BOM at the beginning
of a UTF-8 text file.

However, it IS a bug to put a BOM before a #! because the shebang
protocol demands that the first two bytes of the file be 0x23 0x21.

/s/ Adam