lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi:

On Tue, Feb 18, 2014 at 3:04 PM, Roberto Ierusalimschy
<roberto@inf.puc-rio.br> wrote:
> Just for the record: In my machine, the following program,
...
> reading the Bible, takes ~0.07s with the current implementation and
> ~0.14s with this proposal.

Just this morning, working on unrelated thing, I stumbled upon the
unlocked_stdio(3) man pages, which make me realize my experience with
non-thread-aware stdios is totally outdated :(

After receiving this I decided to make a very simple test, as I
realized sync overhead may kill perfomance. Test program is attached,
run times where repeated and are quite repetitive.

Note, this is not to endorse my solution, after reading this I totally
see it needs rethinking, and I completely withdraw it, but I thought
the results may be useful for somene which needs to do the things I
did in the past with getc ( mainly, processing huge files using a
simple loop plus a state machine, or something similar).

I do not know how long is the bible, or where to grab it, so I just
used a file sitting around on my computer big enough to give relevant
timings, small enough to insure full caching ( and accessible to
anyone who may want to repeat the tests )

folarte@paqueton:~/tmp$ ls -l ~/Downloads/netbeans-7.4-javase-linux.sh
-rw------- 1 folarte folarte 87140352 Oct 22 12:12
/home/folarte/Downloads/netbeans-7.4-javase-linux.sh

I got this times the first two runs:
folarte@paqueton:~/tmp$ ./timeit <~/Downloads/netbeans-7.4-javase-linux.sh
Warm disk cache: 0.777915
fgets: 0.086217
getc: 0.817653
fgets_unlocked: 0.080532
getc_unlocked: 0.070354
folarte@paqueton:~/tmp$ ./timeit <~/Downloads/netbeans-7.4-javase-linux.sh
Warm disk cache: 0.080762
fgets: 0.080582
getc: 0.852571
fgets_unlocked: 0.078911
getc_unlocked: 0.069547

I repeated it several more times, timing was stable enough.

As you can see LOCKING seems to be killing performance.  fgets is not
too bad, but I consistently got about 2/3% more time due to locks, but
getc got always more than 11 times slower, more than 1000% sync
penalty. Also, in every run I did, the unlocked versions of fgets
where always noticeably slower than unlocked getc, which correlates
with my outdated experience which non-thread aware runtimes.

IIRC Roberto uses linux as me, so I suppose his smaller time
difference is due to all the extra processing done instead of my empty
loops, I was just trying to measure raw read & discard performance.

And I'll repeat myself, this is not to defend my proposal, it's wrong
on current runtimes ( unlocked is not ANSI, lua is better served by an
the current solution, and if I needed an ultrafast module I would
possibly just go for raw read(2) for better control ( although with
the current fgets speed, normal fgets ( or the alternate getc ) real
world usage will probably be limited by disk throughput  )  ) , but I
figured once I've taken the time to measure the info it could be
useful for the community.

Hapy hacking.

Francisco Olarte.

PS: Just for timing comparison, with the file cached:

folarte@paqueton:~/tmp$ dd if=~/Downloads/netbeans-7.4-javase-linux.sh
of=/dev/null
170196+0 records in
170196+0 records out
87140352 bytes (87 MB) copied, 0.155959 s, 559 MB/s
folarte@paqueton:~/tmp$ dd if=~/Downloads/netbeans-7.4-javase-linux.sh
of=/dev/null bs=16384
5318+1 records in
5318+1 records out
87140352 bytes (87 MB) copied, 0.0259828 s, 3.4 GB/s
folarte@paqueton:~/tmp$ dd if=~/Downloads/netbeans-7.4-javase-linux.sh
of=/dev/null bs=32768
2659+1 records in
2659+1 records out
87140352 bytes (87 MB) copied, 0.0238975 s, 3.6 GB/s
folarte@paqueton:~/tmp$ dd if=~/Downloads/netbeans-7.4-javase-linux.sh
of=/dev/null bs=65536
1329+1 records in
1329+1 records out
87140352 bytes (87 MB) copied, 0.0230595 s, 3.8 GB/s
#include <stdio.h>
#include <sys/time.h>

struct timeval tstart, tend;

void start() {
  rewind(stdin);
  gettimeofday(&tstart,NULL);
}

void end(char * name) {
  gettimeofday(&tend, NULL);
  tend.tv_sec -= tstart.tv_sec;
  if (tend.tv_usec < tstart.tv_usec) {
    --tend.tv_sec;
    tend.tv_usec+=1000000;
  }
  tend.tv_usec-=tstart.tv_usec;
  printf("%s: %lu.%06lu\n", name, 
	 (unsigned long)tend.tv_sec, 
	 (unsigned long)tend.tv_usec);
}

int main(int ac, char **av) {
  char buf[2000];

  FILE * f = stdin; /* Just in case macro is slow. */
  setvbuf(f, NULL, _IOFBF, 0); /* Just in case. */

  start();
  while (fgets(buf, sizeof(buf), f)!=NULL);
  end("Warm disk cache");

  start();
  while (fgets(buf, sizeof(buf), f)!=NULL);
  end("fgets");

  start();
  while (getc(f)!=EOF);
  end("getc");

  start();
  while (fgets_unlocked(buf, sizeof(buf), f)!=NULL);
  end("fgets_unlocked");

  start();
  while (getc_unlocked(f)!=EOF);
  end("getc_unlocked");

  return 0;
}