lundi 3 décembre 2012

html pager `onliner` with xmllint aka Y U NO RTFM

TIL xmllint can interpret XPath expressions, and has a html parser. No more ugly frankensed expressions to deals with trees. Ahhh DSLs.


dummy@x60s_GPT ~ % for page in $(hrefs URL | egrep $(basename URL) | sort | uniq) ;
do
curl -sL ${page}
| xmllint --html --xpath '//*[@id="content"]' -
| html2text;
done | less


where hrefs is, note the old school sedism which will soon be deprecated:

dummy@x60s_GPT ~ % cat $(which hrefs)

#!/usr/bin/env dash

URL="${1}"
curl -sL ${URL} | sed 's.>.>\n.g' | sed -n '/href/I s@^.*href="\([^"]\+\)".*$@\1@Igp' 


ps: no need to criticize my fault-tolerantless style; I'm still waiting for a whole lisp user-space so why bother...

Aucun commentaire:

Enregistrer un commentaire