Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What are your favorite large, publicly available datasets?


Biased reply (I'm a data scientist there): Common Crawl[1]. We build and maintain an open repository of web crawl data that can be accessed and analyzed by anyone completely free.

[1]: http://commoncrawl.org/



The Cancer Genome Atlas, Ensembl, 1000Genomes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: