lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi

I wanted to show some progress when downloading big files with socket.http, had some problems and made some changes.
The main issue, I've had, is

* How to access the received headers from within the storing sink?

Main reason is, I need the "content-length" from the reply to calculate progress. Other reasons could be the "filename" header, which you will need, when the filename is not detectable from the url.

My solution (*please comment, if this is ok or if there is a better way*):

I have added two lines in socket.http in function trequest(reqt)

   headers = h:receiveheaders()
+   reqt.reply={}
+   reqt.reply.headers=headers
   -- at this point we should have a honest reply from the server

Now I have the possibility to write a receiving function with progress display. Since my request table is passed all through the http layers, while being still the same table, modifications to the request (as made above) can be used in the
sink (as shown below)

--8<--snip--8<--snip--8<--snip--8<--snip--8<--snip--8<--snip--8<--snip--8<--snip

------------------------------------------------------------------------
--- get one url save into a file
-- @param url to get
-- @param file to save
------------------------------------------------------------------------
function get_url_save_long_file(url,file)
   printf("Retrieving %s\n",url)
   local request=url
   if type(url)=="string" then
       request={url=url}
   end

   local fd,err=io.open(file,"wb")
   if not fd then
       Error("open(%s)failed(%s)\n",file,err)
       return nil
   end

   local want
   local have=0
   local p1=io.stdout:seek()
   local t0=socket.gettime()
   ---------------------------------
   -- the receiving filter
   ---------------------------------
   local function sink_fd(chunk, src_err)
       if chunk == nil then
           -- no more data to process, we won't receive more chunks
           fd:close()
           if src_err then
               printf("\n ==> Src_Error=%s\n",src_err)
               return nil,src_err
-- source reports an error, TBD what to do with chunk received up to now
           else
               printf("\n ==> EOF %s\n",dots(have))
               return true -- or anything that evaluates to true
           end
       elseif chunk == "" then
           printf("\n ==> ''\n")
            -- this is assumed to be without effect on the sink, but may
            -- not be if something different than raw text is processed
            -- do nothing and return true to keep filters happy
            return true -- or anything that evaluates to true
       else
           -- try to get expected length
           if have==0 then
               -- this is where I access the header
               local h=request.reply and request.reply.headers
               want=h["content-length"]
           end
           local size=#chunk
           local elapsed=socket.gettime()-t0
           have=have+size
           if p1 then
               io.stdout:seek("set",p1)
           end
           if want then
               local kbs=0.001*have/elapsed
               local total=elapsed*want/have
               local remain=total-elapsed
               local time_for_this=elapsed*size/have
printf(" ==>%d %8s/%8s %6.2fkbs (%s/%s rem %s)%s \r",size,dots(have),dots(want),kbs,t2s(elapsed),t2s(total),t2s(remain),t2s(time_for_this))
           else
               printf(" ==> %8s (%s)   \r",dots(have),t2s(elapsed))
           end
            -- chunk has data, process/store it as appropriate
           fd:write(chunk)
           return true -- or anything that evaluates to true
       end
         -- in case of error
       return nil, err
   end

   request.sink=sink_fd
   local ret,sts=http.request(request)
   printf("Retrieved \"%s\" = ret=%s,sts=%s\n",url,vis(ret),vis(sts))
   return sts
end

-->8--end-->8--end-->8--end-->8--end-->8--end-->8--end-->8--end-->8--end

This worked very fine for me, until I met a redirecting website. My sink only got the header from the redirect-reply, not for the real data one.
So I made another change (comments again are welcome)

This is the original

function tredirect(reqt, location)
   local result, code, headers, status = trequest {
       -- the RFC says the redirect URL has to be absolute, but some
       -- servers do not respect that
       url = url.absolute(reqt.url, location),
       source = reqt.source,
       sink = reqt.sink,
       headers = reqt.headers,
       proxy = reqt.proxy,
       nredirects = (reqt.nredirects or 0) + 1,
       create = reqt.create
} -- pass location header back as a hint we redirected
   headers = headers or {}
   headers.location = headers.location or location
   return result, code, headers, status
end

This is my version

function tredirect(reqt, location)
   reqt.url=url.absolute(reqt.url, location)
   reqt.nredirects=(reqt.nredirects or 0) + 1
   local result, code, headers, status = trequest(reqt)
   -- pass location header back as a hint we redirected
   headers = headers or {}
   headers.location = headers.location or location
   return result, code, headers, status
end

You see the difference? Instead of creating a new request i update the current request, so get_url_save_long_file can get the informations needed. Is the ok? Or are there any serious reasons to copy the request as in the original version?

Looking for advice,
Regards JJvB