A staggering number of people in any large organization are basically working as a sort of "information filter" to simply condense information and report it up the organizational food chain. A sufficiently clever combination of OCR, NLP, and ML could automate a lot of those jobs. In other words, the executive set needs a Summly for industry intelligence. (Startup idea that I'm sure someone with VC connections has thought of already)
The trouble with PDFs is they're designed to be consumed by human eyes only. Any attempt to automatically extract information from them is fundamentally a hacky scrape-job.
The trouble with PDFs is they're designed to be consumed by human eyes only. Any attempt to automatically extract information from them is fundamentally a hacky scrape-job.