[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Skipping leading "shebangs" in a file
- From: Coda Highland <chighland@...>
- Date: Tue, 9 Aug 2016 13:47:47 -0700
On Tue, Aug 9, 2016 at 1:34 PM, Roberto Ierusalimschy
>> On 8 August 2016 at 05:28, Coda Highland <email@example.com> wrote:
>> > The documentation is accurate. It's done by luaL_loadfilex and not
>> > part of the more general chunk-loading functionality -- you can't
>> > write load("#!/foo\nprint 'hi'") in Lua code and have it work.
>> > /s/ Adam
>> Would it be safe to do it when I load a file but not when I load a string?
>> Would this fit in with how users assume/expect it to work with Lua?
> Well, it is safe in the sense that this is how Lua works :-)
> Ideally, both the shebang and BOM should be handled by lua.c (the
> stand-alone interpreter), not by any library function. But it
> is difficult to implement them apart from the rest of loadfile
> functionality, so we put them inside 'luaL_loadfilex'.
> In particular, we did not document the skipping of BOM because that
> sounds like a hack for us. BOMs in ascii or utf-8 documents is a bug,
> period. Given that this bug is so prevalent, we decided to handle it,
> but it is a kind of "implementation detail". Documenting it would sound
> like a recognition that utf-8 BOMs could have some place in a reasonable
> -- Roberto
BOMs in UTF-8 are permitted by the spec, but they are neither required
nor recommended. If U+FEFF is present at the beginning of any untagged
Unicode stream (including UTF-8) a compliant implementation MUST
consider it a BOM and skip it instead of including in the text stream
as a zero-width nonbreaking space. However, a leading BOM is
PROHIBITED from being included in strongly-typed data where the
encoding is explicitly defined, such as a database field.
As such, since text files on disk in most current filesystems do not
carry encoding metadata, it isn't a bug to put a BOM at the beginning
of a UTF-8 text file.
However, it IS a bug to put a BOM before a #! because the shebang
protocol demands that the first two bytes of the file be 0x23 0x21.