I do a good bit of financial/news scraping. I'd find this useful, as the OP is clearly a much cleaner read; so thanks for the demo! However I was always of the impression that large portions of what may be important data came in relatively unstructured and one needed to read the S1 with an eye for certain topics or inflection points? Do you find this to be true/do you think there are any heuristics to expose critical segments in a less supervised fashion?
Thank you for the comment! As an ex-investment banker I can tell you that not all sections are not given equal amount thought when putting the doc together – large swaths of the doc are often legal template language. My top heuristics are 1) focus on the "Business" and "MD&A" section 2) look for the non-gaap metrics (since they are often internal KPIs the company follows to measure performance). Let me know if you have more specific things you're looking for.