it's evil and slow, I'll make a few mods and whack it up, got an email or twitter acct & I'll contact u. -- PR
implemented in py using
- Beautiful Soup 4
- Requests
- htmllib
Core idea is to grab an index url and determine size of page and resources by looking at urls likely to be pulled by page. JS not implemented from what I can read. Method is something like:
* pull index page with requests by url
* extract headers
* build list of resource urls
* look at content type to determine new request
* extract images, text, css from urls in page
* look at content length to determine size