Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Anyone who takes even an hour to audit anything about the Internet Archive will soon come to a very sad conclusion.

The physical assets are stored in the blast radius of an oil refinery. They don't have air conditioning. Take the tour and they tell you the site runs slower on hot days. Great mission, but atrociously managed.

Under attack for a number of reasons, mostly absurd. But a few are painfully valid.



Their yearly budget is less than the budget of just the SF library system.


What I don't understand is how do they afford the storage costs? Surely it must be pricey to have 70+ petabytes of data that's only growing.


Their Form 990s are hard to decipher (many many entitites) and they don't talk much about how they're structured or how they run their backend.

A big chunk was outsourced to Sun at one point. And that name alone should tell you how current the information is. https://en.wikipedia.org/wiki/Sun_Modular_Datacenter

In 2020 at least one public filing shows expenses of $19.9MM with $9.2MM classified as wages. So no more than $900k/month in 2020 and maybe double that now. Recent data is messy due to Covid donations and lawsuits.


Then maybe they should've figured out how to keep hard drives in a climate controlled environment before they decided to launch a bank.

https://ncua.gov/newsroom/press-release/2016/internet-archiv...


How significant is "in the blast radius of an oil refinery"? Once every how many years should I expect a typical oil refinery to explode? This really doesn't seem like it should be their first, second, fifth, or twelfth priority to"solve".

EDIT: asking Claude:

Based on historical data, major refinery explosions in developed countries might occur at a rate of approximately 1 in 1,000 to 1 in 2,000 refinery-years of operation. Using this very rough estimate, a single refinery might have approximately a 50% chance of experiencing a significant explosion somewhere between 700-1,400 years of continuous operation.


"11 years ago, the Chevron refinery exploded. It wasn’t a surprise."

https://apen4ej.medium.com/11-years-ago-the-chevron-refinery...

Keep in mind that Brewster bought the building because it looked like the icon, not vice versa. Not exactly the amount of thought that might be expected of an archival institution.


Somebody should tell Claude that the Wikipedia article for a refinery located on San Francisco Bay contains the headings:

- 1989 explosion and fire

- 1999 explosion and fire

- 2012 fire

The 2012 incident sent 15,000 people to the hospital.

https://en.wikipedia.org/wiki/Chevron_Richmond_Refinery


I realized recently, who needs torrents? I can get a good rip of any movie right there.


I understand what you describe is prohibited in many jurisdictions, however I’m curious about the technical aspect : in my experience they host the html but often not the assets, especially big pictures and I guess most movies files are bigger that pictures. Do you use a special trick to host/find them?


No. And every video game every made is available for download as well. If you even have to download it: they pride in making many of them playable in browser with just a click.

Copyright issues aside (let's avoid that mess) I was referring to basic technical issues with the site. Design is atrocious, search doesn't work, you can click 50 captures of a site before you find one that actually loads, obvious data corruption, invented their own schema instead of using a standard one and don't enforce it, API is insane and usually broken, uploader doesn't work reliably, don't honor DMCA requests, ask for photo id and passports then leak them ...

It's the worst possible implementation of the best possible idea.


And yet, it's the best we currently have. I donate to them. We can come with demands of how it should be managed, but it should not prevent us from helping them.


If you poke around at what US government agencies are doing, and what European countries and non-profits are doing, or even do a deep dive into what your local library offers, you may find they no longer lead the pack.

They didn't even ask for donations until they accidentally set fire to their building annex. People offered to help (SF was apparently booming that year) and of course they promptly cranked out the necessary PHP to accept donations.

Now it's become part of the mythology. But throwing petty cash at a plane in a death spiral doesn't change gravity. They need to rehabilitate their reputation and partner with organizations who can help them achieve their mission over the long term. I personally think they need to focus on archival, legal long-term preservation and archival, before sticking their neck out any further. If this means no more Frogger in the browser, so be it.

I certainly don't begrudge anyone who donates, but asking for $17 on the same page as copyrighted game ROMs and glitchy scans of comic books isn't a long-term strategy.


There should not be any physical centralization. Use a series of redundant IPFS pins and/or torrents or some decentralized database of some kind.


They've tried for years and nobody steps up. And as it turns out they couldn't even maintain torrent files at scale. Broken for years, and still no strategy for versioning them when metadata or files change.

Also until recently their whole model was storing physical material (on an active fault line next to an oil refinery) then allowing digital access to it. Courts ruled that illegal for modern works.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: