It'd also be cool to make a search engine that catalogged all of the websites that other search engines politely did not.
Is there any proper enforcement of robots.txt or is it just polite? Because I don't see how one search engine could maintain an advantage if others start catalogging every site.
It's polite, but the teeth behind it is that it's much easier to secure a CFAA conviction for scraping if you can show evidence that you specifically forbade a scraper from accessing your site (via robots.txt, authentication, or IP blocks) and they circumvented that restriction than if you gave them no notice that you dislike what they're doing.
Legit search engines and aggregators have a strong incentive not to kill the golden goose by getting sued, particularly when the vast majority of sites actually want to be indexed.
Is there any proper enforcement of robots.txt or is it just polite? Because I don't see how one search engine could maintain an advantage if others start catalogging every site.