Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We already know the solution: One well-behaved, shared scraper could serve all of the AI companies simultaneously.

The problem is that they're not doing it.





This is an interesting approach. Archive.org could be such a solution, kind of. Not its cold storage as it's now, but a warm access layer. Sponsorship by AI companies would a good initiative for the project.

I can't imagine IA ever going for it. You'd need a separate org that just scrapes for AI training, because its bot is going to be blocked by anyone who is anti-AI. It wouldn't make sense for it to serve multiple purposes.

Common Crawl would be a better fit, but still might not want to serve in that capacity.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: