[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Determine mime type of a file
- From: Sean Conner <sean@...>
- Date: Mon, 2 May 2011 18:15:34 -0400
It was thus said that the Great steve donovan once stated:
> On Mon, May 2, 2011 at 11:22 AM, Dirk Laurie <email@example.com> wrote:
> > Unfortunately io.popen "is system dependent and is not available on
> > all platforms."
> Well, all _desktop_ platforms, certainly (it can be dodgy on Windows
> with GUI subsystem applications). But then there's always os.execute()
> + redirection to a temp file.
> Adrian has a good suggestion, but it seems easier to parse the output
> of 'file' than to do a library binding.
I had a project where I needed to determine the MIME types of a bunch of
files and I ended up using file, but it wasn't pretty (heck, had I known of
libmagic I would have linked to that instead of the gross hack I'm about to
The program in question pulled the metadata for a file (how large, time
stamps, who owned it, file permissions, etc) and I also wanted the MIME type
(not recorded in the file system). file fit the bill, but ...
I was walking the filesystem and for each file, I did a stat(), then I
wanted to feed the filename to file. I didn't want to spawn a process per
filename (I have over half a million files under my home directory) but
there's an option to file to read filename from stdin.
Problem one: filenames with spaces, tabs and other ... problematic
characters. There's an option for file to use a different file seperator
than space or newline, so I used that.
Then I hit the second problem: popen() (and thus,io.popen() in Lua) can't
use "r+" or "w+" (that is, you can't open a bi-direction stream). I fixed
that with my own popen() implementation, when I hit:
Problem three: the C library defaults to fully buffered input. I was
writing the filename to the pipe, but on the other end, the C library was
waiting for BUFSIZ bytes. At the very least, I wanted line buffering on the
other end of the pipe (where file was running) but there's no command line
option to file to specify the type of buffering (since in most cases, who
I ended up writing a small shared library that loads into the child
process (which is running file) that sets the C library buffering mode to
line-based before the child process calls main().
It worked. But it was a hack I never liked, and in retrospect, doing a
library binding would have been *much* easier. Granted, I had a rather odd
use case, but still ...
-spc (I suspect the second problem I had might be an issue here as well)