lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great Dirk Laurie once stated:
> 2014-04-17 9:40 GMT+02:00 Oliver Kroth <oliver.kroth@nec-i.de>:
> > to my knowledge in most "big" OS, there are already libraries for handling
> > Unicode semantics.
> > I'd like to propose to let Lua do the UTF-8 encoding matters, and use a
> > (probably OS-specific) glue library to refer the Unicode semantics to the
> > underlying OS. This library may e.g. be named "unicode" to avoid name
> > clashes with utf8.
> >
> > There is no sense in re-inventing the wheel.
> 
> Invoking OS support is not as well supported in Lua as in e.g. Perl.
> 
> One intensely annoying restriction that Lua suffers, out of portability
> considerations no doubt, is that we can only have a write-to pipe or
> a read-to pipe, not a filter. I'd love to write
> 
>    textout = os.filter("iconv -f windows-1250 -t utf8",textin)

  First off, there may be Lua modules that link to iconv [1] so there really
shouldn't be a reason to filter out to that program.  

  Secondly, having attempted to do filter like stuff (piping to a program
for read/write) and failing miserably [2], I will go into details of why it
failed miserably (at least for Unix).

  Several years ago I wrote a program where I was indexing a bunch of files
in a directory.  I wanted to run file over each file to get its type (and
not necessarily rely upon the extention).  I was already getting a list of
files, and file could accept filenames on stdin (the "-f-" option), so I
thought to myself, "Self, I could set up a pipe such that I write a list of
files to it, and read the file types back out."

  I did that, and the program immediately locked up.  It didn't crash, but
it wasn't running either.  And the problem wasn't a bug in my code, nor a
bug in file, but in the semantics of C's handling of stdin and stdout with a
non-tty stream.

  If I do:

	GenericUnixPrompt> program1 | program2

stdout of program1 is a pipe and stdin of program2 is also a pipe.
Obviously.  But what isn't quite so obvious (unless you looked it up) is how
C handles stdout and stdin (through <stdio.h>) is that by default, the
buffering is fully buffered, meaning, the data to stdout isn't written until
you reach some threshhold (around 4k to 8k in a typical Unix
implementation), and the same holds for stdin, except in reverse (no data is
returned until there's around 4-8k read).  And it does no good to change the
buffering of the output side to "nothing"
(setvbuf(stdout,NULL,_IONBUF,BUFSIZ)) because you still have full buffering
on the input side.

  I got around the issue by using some (Unix) linking magic.  Basically, I
set LD_PRELOAD (an environment variable) to a shared library that was
nothing other than:

	void __attribute__ ((constructor)) init(void);

	void init(void)
	{
	  setvbuf(stdin, NULL,_IOLBF,BUFSIZ);
	  setvbuf(stdout,NULL,_IOLBF,BUFSIZ);
	}

so that when I did (approximately):

	fp = popen("magic -f-","r+");

the shared library was opened as the "magic" program was being loaded, the
init() function called to initialize the buffering on stdin/stdout so this
whole mess would work (otherwise, I would have had to modify the source code
to "file" and I didn't want to go to that trouble).  

  Yes, this is a form of monkeypatching (to tie this into another thread
around here).

  Yes, this worked.  But I required extra configuration (I needed to keep
track of where the special shared library was so I could load it) and I
never did feel exactly good about it (and later I found out about the
"magic" library that "file" was a wrapper around and fixed the program to
use that instead of this gross hack). So yes, there was a simpler, more
traditional way to do what I wanted. [3]

  So yes, that's why doing a read/write pipe to a program is usually not
done.

  -spc (Been there, done that, oddly, never got a tee-shirt)

[1]	Oh, say, here:

	https://github.com/spc476/lua-conmanorg/blob/master/src/iconv.c

[2]	For various values of "miserably".  

[3]	"To the point that smart, experienced hackers reach for a monkey
	patch as their tool of first resort, _even when a simpler, more
	traditional solution is possible_."

	http://devblog.avdi.org/2008/02/23/why-monkeypatching-is-destroying-ruby/