lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I wanted to show some progress when downloading big files with socket.http, had some problems and made some changes.
The main issue, I've had, is

* How to access the received headers from within the storing sink?

Main reason is, I need the "content-length" from the reply to calculate progress. Other reasons could be the "filename" header, which you will need, when the filename is not detectable from the url.

My solution (*please comment, if this is ok or if there is a better way*):

I have added two lines in socket.http in function trequest(reqt)

   headers = h:receiveheaders()
+   reqt.reply={}
+   reqt.reply.headers=headers
   -- at this point we should have a honest reply from the server

Now I have the possibility to write a receiving function with progress display. Since my request table is passed all through the http layers, while being still the same table, modifications to the request (as made above) can be used in the
sink (as shown below)


--- get one url save into a file
-- @param url to get
-- @param file to save
function get_url_save_long_file(url,file)
   printf("Retrieving %s\n",url)
   local request=url
   if type(url)=="string" then

   local fd,,"wb")
   if not fd then
       return nil

   local want
   local have=0
   local p1=io.stdout:seek()
   local t0=socket.gettime()
   -- the receiving filter
   local function sink_fd(chunk, src_err)
       if chunk == nil then
           -- no more data to process, we won't receive more chunks
           if src_err then
               printf("\n ==> Src_Error=%s\n",src_err)
               return nil,src_err
-- source reports an error, TBD what to do with chunk received up to now
               printf("\n ==> EOF %s\n",dots(have))
               return true -- or anything that evaluates to true
       elseif chunk == "" then
           printf("\n ==> ''\n")
            -- this is assumed to be without effect on the sink, but may
            -- not be if something different than raw text is processed
            -- do nothing and return true to keep filters happy
            return true -- or anything that evaluates to true
           -- try to get expected length
           if have==0 then
               -- this is where I access the header
               local h=request.reply and request.reply.headers
           local size=#chunk
           local elapsed=socket.gettime()-t0
           if p1 then
           if want then
               local kbs=0.001*have/elapsed
               local total=elapsed*want/have
               local remain=total-elapsed
               local time_for_this=elapsed*size/have
printf(" ==>%d %8s/%8s %6.2fkbs (%s/%s rem %s)%s \r",size,dots(have),dots(want),kbs,t2s(elapsed),t2s(total),t2s(remain),t2s(time_for_this))
               printf(" ==> %8s (%s)   \r",dots(have),t2s(elapsed))
            -- chunk has data, process/store it as appropriate
           return true -- or anything that evaluates to true
         -- in case of error
       return nil, err

   local ret,sts=http.request(request)
   printf("Retrieved \"%s\" = ret=%s,sts=%s\n",url,vis(ret),vis(sts))
   return sts


This worked very fine for me, until I met a redirecting website. My sink only got the header from the redirect-reply, not for the real data one.
So I made another change (comments again are welcome)

This is the original

function tredirect(reqt, location)
   local result, code, headers, status = trequest {
       -- the RFC says the redirect URL has to be absolute, but some
       -- servers do not respect that
       url = url.absolute(reqt.url, location),
       source = reqt.source,
       sink = reqt.sink,
       headers = reqt.headers,
       proxy = reqt.proxy,
       nredirects = (reqt.nredirects or 0) + 1,
       create = reqt.create
} -- pass location header back as a hint we redirected
   headers = headers or {}
   headers.location = headers.location or location
   return result, code, headers, status

This is my version

function tredirect(reqt, location)
   reqt.url=url.absolute(reqt.url, location)
   reqt.nredirects=(reqt.nredirects or 0) + 1
   local result, code, headers, status = trequest(reqt)
   -- pass location header back as a hint we redirected
   headers = headers or {}
   headers.location = headers.location or location
   return result, code, headers, status

You see the difference? Instead of creating a new request i update the current request, so get_url_save_long_file can get the informations needed. Is the ok? Or are there any serious reasons to copy the request as in the original version?

Looking for advice,
Regards JJvB