Getting The Title From Html Files |
|
<!-- <title>ack</title> -->
).
Usage example (from the shell):
$ ls *.html cgi.html htaccess.html mod_include.html urlmapping.html configuring.html mod_auth.html mod_rewrite.html core.html mod_cgi.html rewriteguide.html $ ./title.lua *.html cgi.html: Apache Tutorial: Dynamic Content with CGI configuring.html: Configuration Files core.html: Apache Core Features htaccess.html: Apache Tutorial: .htaccess files mod_auth.html: Apache module mod_auth mod_cgi.html: Apache module mod_cgi mod_include.html: Apache module mod_include mod_rewrite.html: Apache module mod_rewrite rewriteguide.html: Apache 1.3 URL Rewriting Guide urlmapping.html: Mapping URLs to Filesystem Locations - Apache HTTP Server
Below is the Lua program title.lua
:
#!/usr/bin/env lua function getTitle(fname) local fp = io.open(fname, "r") if fp == nil then return false end -- Read up to 8KB (avoid problems when trying to parse /dev/urandom) local s = fp:read(8192) fp:close() -- Remove optional spaces from the tags. s = string.gsub(s, "\n", " ") s = string.gsub(s, " *< *", "<") s = string.gsub(s, " *> *", ">") -- Put all the tags in lowercase. s = string.gsub(s, "(<[^ >]+)", string.lower) local i, f, t = string.find(s, "<title>(.+)</title>") return t or "" end if arg[1] == nil then print("Usage: lua " .. arg[0] .. " <filename> [...]") os.exit(1) end i = 1 while arg[i] do t = getTitle(arg[i]) if t then print(arg[i] .. ": " .. t) else print(arg[i] .. ": File opening error.") end i = i + 1 end os.exit(0)
Alternatively, the [lua-gumbo] library can be used:
#!/usr/bin/env lua local gumbo = require "gumbo" local document = assert(gumbo.parseFile(arg[1] or io.stdin)) print(document.title)
In this case, both the HTML5 parser and the Document.title
implementation fully conform to the spec and should produce the exact same result as a modern browser.
lua-gumbo is available via: luarocks install gumbo