lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi everybody.

After babbling about fscanf, my curiosity was piked, so I made a test
program for it. It seems to do nearly what you would need in this case
with nearly no code repetitions, no buffer preinitialization needed.
I'm not familiar enough with lua sources to make a patch but I think
the function could be written more or less like:


/* This would be appropiately sized and initialized at mdule load time

with something like

  sprintf(read_line_format,"%%%d[^\n]%%n", (int)(LUAL_BUFFERSIZE-1));

optionally it could be nailed to "%2047[^\n]%n" and prep the buffer
with length=2048.

*/

char read_line_format[100];

static int read_line (lua_State *L, FILE *f, int chop) {
  luaL_Buffer b;
  int totbytes = 0;
  int c;
  luaL_buffinit(L, &b);
  do () {
    int  l;
    char *p = luaL_prepbuffer(&b);
    int res = fscanf(f,read_line_format, p, &l);
    if (res<1) break; /* Got no chars, out of loop to see why. */
    luaL_addsize(&b, l);
    totbytes+=l;
  } while(l>=LUAL_BUFFERSIZE-1);
  int c=getc(f); /* What stopped us? */
  if (c!=EOF) {
    luaL_addchar(b,c);
    ++totbytes;
  }
  return totbytes;
}

The totbytes counter could be changed to a flag, or a dirty trick
could be made putting l out of the loop, initialize it to -1 and rely
on the only way of not getting it overwriten by scanf being hitting
res<1 in the first try and returning (c!=EOF) || (l!=-1), but I like
counting functions more.

The think I like about it is it has the loop-till-something happens,
test last condition, return structure which seems ( to me ) cleaner,
and does not initialize any buffer, although I fear it is going to be
even slower than fgetc, I just wanted to test wheter my memories of
perverse fscanf usage where right.

I've done some tests, with a proof of concept standalone program which
just reads the first line, and results look nice. I paste all of them
below. I know I could have avoided FFFFFF on negatives with an
unsigned char cast, or using &0xFF, but its just a POC.

Francisco Olarte.

------ test program and some test results: -------------

folarte@paqueton:~/tmp$ cat fscanftst.c
#include <stdio.h>
#include <string.h>

int main(int ac, char **av) {
  #define BUFSIZE 10
  char buf[BUFSIZE+10]; /* Only going to use 10, rest is for simpler tests. */
  int len=-1;
  int res;
  int i;
  char format[100]; /* "%9[^\n]%n" */
  /* Format adequate for LUABUFSIZE could be generated at module load,
or via macro magic. */
  sprintf(format,"%%%d[^\n]%%n", BUFSIZE-1);
  do {
    memset(buf,'\xFF', sizeof(buf)); /* Easier to spot later. */
    res = fscanf(stdin,format, buf, &len);
    if (res<1) break; /* 0 fields scanned, EOF or \n at start  break
and share code. */
    /* Trace what we got. */
    printf("res=%d, len=%d buf=", res, len);
    for (i=0; i<len+3; ++i) {
      printf(" %02X", buf[i]);
    }
    printf("\n");
  } while (len>=BUFSIZE-1);
  /* Once we hit a non-full buffer, even an empty one,  we need to
test next char. Should be '\n' or EOF. */
  i = getc(stdin);
  printf("Last char is %d / %02x\n", i, i);
  return 0;
}
folarte@paqueton:~/tmp$ gcc -Wall -ansi -o fscanftst fscanftst.c
folarte@paqueton:~/tmp$ echo -ne '' | ./fscanftst
Last char is -1 / ffffffff
folarte@paqueton:~/tmp$ echo -ne '\0' | ./fscanftst
res=1, len=1 buf= 00 00 FFFFFFFF FFFFFFFF
Last char is -1 / ffffffff
folarte@paqueton:~/tmp$ echo -ne '\0\n' | ./fscanftst
res=1, len=1 buf= 00 00 FFFFFFFF FFFFFFFF
Last char is 10 / 0a
folarte@paqueton:~/tmp$ echo -ne 'AB\0CD\0EF\0GH\0IJ\n1234' | ./fscanftst
res=1, len=9 buf= 41 42 00 43 44 00 45 46 00 00 FFFFFFFF FFFFFFFF
res=1, len=5 buf= 47 48 00 49 4A 00 FFFFFFFF FFFFFFFF
Last char is 10 / 0a
folarte@paqueton:~/tmp$ echo -ne 'ABDEFGHIJKLMNOPQR' | ./fscanftst
res=1, len=9 buf= 41 42 44 45 46 47 48 49 4A 00 FFFFFFFF FFFFFFFF
res=1, len=8 buf= 4B 4C 4D 4E 4F 50 51 52 00 FFFFFFFF FFFFFFFF
Last char is -1 / ffffffff
folarte@paqueton:~/tmp$ echo -ne 'ABDEFGHIJKLMNOPQRS' | ./fscanftst
res=1, len=9 buf= 41 42 44 45 46 47 48 49 4A 00 FFFFFFFF FFFFFFFF
res=1, len=9 buf= 4B 4C 4D 4E 4F 50 51 52 53 00 FFFFFFFF FFFFFFFF
Last char is -1 / ffffffff
folarte@paqueton:~/tmp$ echo -ne 'ABDEFGHIJKLMNOPQRST' | ./fscanftst
res=1, len=9 buf= 41 42 44 45 46 47 48 49 4A 00 FFFFFFFF FFFFFFFF
res=1, len=9 buf= 4B 4C 4D 4E 4F 50 51 52 53 00 FFFFFFFF FFFFFFFF
res=1, len=1 buf= 54 00 FFFFFFFF FFFFFFFF
Last char is -1 / ffffffff
folarte@paqueton:~/tmp$ echo -ne 'ABDEFGHIJKLMNOPQRS\n' | ./fscanftst
res=1, len=9 buf= 41 42 44 45 46 47 48 49 4A 00 FFFFFFFF FFFFFFFF
res=1, len=9 buf= 4B 4C 4D 4E 4F 50 51 52 53 00 FFFFFFFF FFFFFFFF
Last char is 10 / 0a
folarte@paqueton:~/tmp$ echo -ne 'ABDEFGHIJKLMNOPQRST\n' | ./fscanftst
res=1, len=9 buf= 41 42 44 45 46 47 48 49 4A 00 FFFFFFFF FFFFFFFF
res=1, len=9 buf= 4B 4C 4D 4E 4F 50 51 52 53 00 FFFFFFFF FFFFFFFF
res=1, len=1 buf= 54 00 FFFFFFFF FFFFFFFF
Last char is 10 / 0a
folarte@paqueton:~/tmp$ echo -ne 'ABDEFGH\nJKLMNOPQRS\n' | ./fscanftst
res=1, len=7 buf= 41 42 44 45 46 47 48 00 FFFFFFFF FFFFFFFF
Last char is 10 / 0a
folarte@paqueton:~/tmp$ echo -ne 'ABDEFGHI\nJKLMNOPQRS\n' | ./fscanftst
res=1, len=8 buf= 41 42 44 45 46 47 48 49 00 FFFFFFFF FFFFFFFF
Last char is 10 / 0a
folarte@paqueton:~/tmp$ echo -ne 'ABDEFGHIJ\nJKLMNOPQRS\n' | ./fscanftst
res=1, len=9 buf= 41 42 44 45 46 47 48 49 4A 00 FFFFFFFF FFFFFFFF
Last char is 10 / 0a
folarte@paqueton:~/tmp$ echo -ne 'ABCDEFGHI\0\0\n' | ./fscanftst
res=1, len=9 buf= 41 42 43 44 45 46 47 48 49 00 FFFFFFFF FFFFFFFF
res=1, len=2 buf= 00 00 00 FFFFFFFF FFFFFFFF
Last char is 10 / 0a
folarte@paqueton:~/tmp$ echo -ne 'ABCDEFG\0\0\n' | ./fscanftst
res=1, len=9 buf= 41 42 43 44 45 46 47 00 00 00 FFFFFFFF FFFFFFFF
Last char is 10 / 0a
folarte@paqueton:~/tmp$ echo -ne 'ABCDEFG\0\0' | ./fscanftst
res=1, len=9 buf= 41 42 43 44 45 46 47 00 00 00 FFFFFFFF FFFFFFFF
Last char is -1 / ffffffff
folarte@paqueton:~/tmp$ echo -ne 'El perro del tío Roque no tiene rabo
porque Ramón Ramírez con su carro se lo ha cortado.\n' | ./fscanftst
res=1, len=9 buf= 45 6C 20 70 65 72 72 6F 20 00 FFFFFFFF FFFFFFFF
res=1, len=9 buf= 64 65 6C 20 74 FFFFFFC3 FFFFFFAD 6F 20 00 FFFFFFFF FFFFFFFF
res=1, len=9 buf= 52 6F 71 75 65 20 6E 6F 20 00 FFFFFFFF FFFFFFFF
res=1, len=9 buf= 74 69 65 6E 65 20 72 61 62 00 FFFFFFFF FFFFFFFF
res=1, len=9 buf= 6F 20 70 6F 72 71 75 65 20 00 FFFFFFFF FFFFFFFF
res=1, len=9 buf= 52 61 6D FFFFFFC3 FFFFFFB3 6E 20 52 61 00 FFFFFFFF FFFFFFFF
res=1, len=9 buf= 6D FFFFFFC3 FFFFFFAD 72 65 7A 20 63 6F 00 FFFFFFFF FFFFFFFF
res=1, len=9 buf= 6E 20 73 75 20 63 61 72 72 00 FFFFFFFF FFFFFFFF
res=1, len=9 buf= 6F 20 73 65 20 6C 6F 20 68 00 FFFFFFFF FFFFFFFF
res=1, len=9 buf= 61 20 63 6F 72 74 61 64 6F 00 FFFFFFFF FFFFFFFF
res=1, len=1 buf= 2E 00 FFFFFFFF FFFFFFFF
Last char is 10 / 0a