lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Feb 19, 2014, at 19:42 , Roberto Ierusalimschy wrote:

OK, so reading a line in a binary file is only a matter of a proper
implementation.

Give some efficient implementation that can read lines containing
arbitrary binary data ('\x00', '\x0D', '\x04', '\x0A', '\x1A', etc.)
and we can consider it to fix the current "bug".

So for the special case of non newline terminated files we unfortunately
need to pre fill the whole buffer with \n. As mentioned to avoid a noticeable
performance drop the buffer needs to be smaller than 1-8k it currently is.
The removal of the last byte swapping in my earlier version apparently
improves performance a bit, too. This version should address all corner
cases and works in all my test cases, and is less than 5% slower on the
bible test case.

I like to point out that the current Lua implementation does not "chop"
anything for the special case of text files without terminating newline in
the last line. WIth the current (5.2) Lua version the result for such a line
is always the same, chop or not, because "if (l == 0 || p[l-1] != '\n')"
evaluates to true, and thus an extra round without reading anything
does not "chop" anymore.

The patch below does not do such an extra round I added one more
line of code to make sure "chop" works in that case, too (p[l-1] = '\n';).

That line can be altered should the current implementation behavior
be more favorable.

Any considerations aside the stray SIZE #define?

Again, this does not regress any results for DOS, Windows, etc.
it only makes sure whatever fgets returns ends up in the Lua string.

--- lua-5.2.3/src/liolib.c 2013-04-12 18:48:47.000000000 +0000
+++ lua-5.2.3-patched/src/liolib.c 2014-02-20 10:19:53.545272394 +0000
@@ -371,17 +371,24 @@
 static int read_line (lua_State *L, FILE *f, int chop) {
   luaL_Buffer b;
   luaL_buffinit(L, &b);
+#define SIZE 82
   for (;;) {
     size_t l;
     char *p = luaL_prepbuffer(&b);
-    if (fgets(p, LUAL_BUFFERSIZE, f) == NULL) {  /* eof? */
+    memset(p, '\n', SIZE);
+    if (fgets(p, SIZE - 1, f) == NULL) {  /* eof? one terminal \n */
       luaL_pushresult(&b);  /* close buffer */
       return (lua_rawlen(L, -1) > 0);  /* check whether read something */
     }
-    l = strlen(p);
+    l = (char*)memchr(p, '\n', SIZE) - p + 1;
+    if (l == SIZE) l -= 2; /* our reserve terminal \n + 0 terminator */
     if (l == 0 || p[l-1] != '\n')
       luaL_addsize(&b, l);
     else {
+      if (l >= 2 && p[l - 2] == 0 && feof(f)) {
+        --l; /* EOF without \n and thus 0-terminated + our \n */
+        p[l-1] = '\n';
+      }
       luaL_addsize(&b, l - chop);  /* chop 'eol' if needed */
       luaL_pushresult(&b);  /* close buffer */
       return 1;  /* read at least an `eol' */


René
-- 
 ExactCODE GmbH, Jaegerstr. 67, DE-10117 Berlin
 http://exactcode.com | http://exactscan.com | http://ocrkit.com | http://t2-project.org | http://rene.rebe.de