The new archive.org

aw3c2 · on Oct 29, 2014

Awww why... This looks incredibly cluttered. Infinite scrolling is a terrible idea in an _archive_. If you use the list view instead, you get a very hard to scan "3 narrow lines for 1" design.

The site is very broken if you don't have Javascript enabled. I am scared how CPU intense it would be on my mobile or cheap netbook. The details of collections don't even display any items.

Where is the list of files inside an item? Previously there was a nice table. Now everything seems to focus on images instead. IA hosts a lot of things that are not visual. Music, texts, data. Those seem like second row citizens now. The cover of an album tells me .. nothing about the music itself.

For an archive, I think this is a rather bad interface. The technical implementation seems very un-archivey and more suited to a "dumb user" discovery interface built upon an existing well-presented archive. :(

PS: The categories and tags on the side are a nice addition.

If someone from IA reads this, I think at least https://archive.org/advancedsearch.php is not using output buffering which might make user's performance much better.

Overall the site is very very slow.

edit: Some comparison images.

Old: http://i.imgur.com/gJXgJhI.png

New default: http://i.imgur.com/JOEoAiu.png

New list: http://i.imgur.com/m2d1Gf4.png

Old: http://i.imgur.com/X7e2s5T.png

New: http://i.imgur.com/9HsrQO1.png

vanderZwan · on Oct 29, 2014

> For an archive, I think this is a rather bad interface. The technical implementation seems very un-archivey and more suited to a "dumb user" discovery interface built upon an existing well-presented archive. :(

I think this hits the nail on the head. I don't think the new interface is bad per se, it's just optimized for a different goal than the one of most current visitors of archive.org. It would probably work best as a page dedicated to "dumb user discovery" - which might very well be an user-base worth pursuing.

This is one of those cases where creating user personas and trying to come up with different interfaces optimised for each of them would probably be the best approach.

aw3c2 · on Oct 29, 2014

I wholeheartedly agree. Interfaces like that can be nice for initial browsing but I am a user who knows what she wants and I want efficiency and lots of metadata.

textfiles · on Oct 30, 2014

The overarching plan, as I expect it, will be to have a number of ways to interact with the system. So there'll be this Pinterest-like visual browsing, more "lists" oriented, and maybe other versions, like research-oriented and so on. For now, there's visual and lists.

pbhjpbhj · on Oct 29, 2014

How does the lazy-load infinite scrolling stop you from getting to what you want? [I'm not saying it doesn't, I'm asking how]

stedaniels · on Oct 29, 2014

It might not stop you scrolling there, but:

Linking to page x of result set y so you can send it to a colleague...

Or even say I've scrolled dozens of pages down, then navigate to a link, when I use my browsers back button I'm not back at the top of a list with no way to easily get back to where I was.

ijk · on Oct 29, 2014

It also makes it hard to skip further down the page. It's an issue I keep running into with Google+ communities: there's something that I know was from early in the timeline's history, but I don't remember the exact dates. But the only way to access it is to scroll slowly through the entire archive. (Google+ has the additional problems of an ineffective search that can only be sorted by "best result" or "most recent".)

dchuk · on Oct 29, 2014

Both of these problems can be designed around from a UX perspective:

1) As a user is scrolling, you can use PushState to update the URL to provide actual linkable pages. It does make re-clicking that link in the future odd though because that page's results are now at the top of the screen so scrolling up doesn't really make sense, but it at least provides the ability to deep link.

2) Sites like Pinterest solve this by opening an item's details page on click in an overlay above the infinite grid. That way when you close it (or click back in your browser) the modal just gets dismissed and you're right where you left off in the infinite scroll.

userbinator · on Oct 29, 2014

"designed around" - great way to put a positive spin on "solving the problems that you created".

Sorry, but I strongly believe infinite scrolling creates more problems than it solves - and I'm not entirely sure it solves much of a problem in the first place; the "consume, consume, consume" mentality that it's oriented towards is great for things like news agencies trying to push as many stories as it can to its readers, but this is an archive of content to be browsed in a more... thoughtful manner.

dchuk · on Oct 29, 2014

1) I have nothing to do with Archive.org, I'm just speaking from a UX designer's perspective. "Designing around" constraints is something everyone involved in a site build does, from backend engineers to designers to front guys to DBAs. You're presented with a set of constraints, and you come up with solutions that fit those constraints. Nothing evil.

2) There's nothing inherently wrong with "consume, consume, consume". I don't think anyone in the world would argue that Pinterest is evil or bad. It's a site for consuming en masse, plain and simple, and an infinite scroll enhances the experience. Clicking "next" 50 times would significantly ruin the user experience on that (and similar) sites.

e7620 · on Oct 29, 2014

If you need to advance exactly 50 pages, the worst thing there is from a usability perspective is to press PageDown 50 times, instead of clicking 5 times with progressive pagination or inputing the number in other traditional designs.

pinterest, facebook or twitter are whole-content streams, that's why they benefit from infinite scroll.

db48x · on Oct 29, 2014

All very true, but if you're serious about looking at every item that results from some search (because you're researching, not merely browsing), then you will use the advanced search interface to get a CSV file (or JSON, XML, HTML, or RSS, according to preference) which you can then work with entirely locally. You'll be able then to keep track of everything you do with every item in that list.

pbhjpbhj · on Oct 29, 2014

>I strongly believe infinite scrolling creates more problems than it solves //

It sure does when it's used inappropriately. But can you imagine Pinterest, say, without it?

Sure it sacrifices findability or reproducibility of a view in exchange for discoverability and presentation of new content but that sacrifice works - and IMO it can work for an archive.

Provided items of content can be found, can be tagged by users (bookmarks anyone?) then if you're targetting the presentation of continuously updated content sources infinite scroll is fine IMO.

Do you have a problem with HN because one can't link to a specific view of a list of stories? Or with Google because SERPs change? Or with your favourite blog because the front-page updates with each new post?

aw3c2 · on Oct 29, 2014

HN, Google SERP or blog frontpages are not archives.

xanderstrike · on Oct 29, 2014

> The site is very broken if you don't have Javascript enabled.

I agree with all of your other concerns, but it is not the responsibility of the web designer to accommodate people who selectively disable parts of their website. If you turn off CSS, or the color blue, your experience will degrade, but that's 100% your fault. The same is true with javascript.

It runs quite well on my 1.8ghz Atom netbook and my Nexus 5. The JS they're using doesn't seem that expensive.

charonn0 · on Oct 29, 2014

True, but no one wins if the user leaves your website frustrated. Gracefully degrading in absence of a full-fledged scriptable web browser should be considered best practice.

textfiles · on Oct 29, 2014

Graceful degredation is definitely a plan.

aw3c2 · on Oct 29, 2014

Now look at it this way: The Wayback Machine is probably the most prominent part of the Internet Archive. Does it handle websites that rely on Javascript?

gohrt · on Oct 29, 2014

Accessibility and USA law: http://warc.calpoly.edu/building/page_design/javascript.html

pbhjpbhj · on Oct 29, 2014

>Infinite scrolling is a terrible idea in an _archive_. //

I disagree, it's about discoverability, scroll to discover random new things. If you're there for a specific thing you won't be wildly scrolling; if you're not then they are doing what you want they're displaying possible areas of interest.

I'm not commenting on the implementation though first look of a few seconds and it seems slow but otherwise quite apposite and cleanly designed.

Don't personally like the accordion-like drop-down of the page to show content above the tabs, but it's OK.

bronson · on Oct 29, 2014

> it's about discoverability, scroll to discover random new things

No. Are we talking about a library or a social site? It's about results, not randomly killing time.

Also, please explain "wildly scrolling?" Who does that? If I'm interested in the 12th search result, is that wild?

pbhjpbhj · on Oct 29, 2014

>If I'm interested in the 12th search result, is that wild? //

How do you know you're interested in the 12th result. Either you did a search and all content displayed is in unknown positions or you already know what you're looking for in which case the scroll is entirely irrelevant, no?

Libraries have 2 main use cases I'd warrant:

1. You go to the library find just to find something to read or some media to consume.

2. You go to the library with a specific piece of media you are looking for (whether that's a title or a content requirement, doesn't matter for this analysis).

Now my local library addresses both of these (physically and in virtual presence). They have shelves with things indexed and categorised for those in use case 2. They have categories of type, displays or new acquisitions, monthly subjects of interest and such for those in use case 1.

It's similar again to visiting a museum, you can go to visit a specific exhibit[ion] (use case 2) or you can go to see what they have (use case 1) - personally I do both.

Now if the museum instead of showing lots of exhibits and having teaser displays and such just had all the exhibits locked in rooms and you had to ask for the key to the specific thing you wanted to look at, then it would make for a far from rich experience except for the very few who already know the content of the museum in depth.

As it is with Archive.org, if you know the content breadth, the sections, the actual content in detail then you don't need to use the scroll, you'll be going direct to the content you know you want. If not then they have chance for you to browse and view content you may not have known was there.

bane · on Oct 29, 2014

I personally love to browse my local libraries (and archive.org) to discover new things.

aw3c2 · on Oct 29, 2014

I disagree. An archive should primarily be about finding, not discovering (ugh, words, and I am no native speaker). What I mean is that randomly browsing is nice but not what an archive is mostly about.

sedachv · on Oct 29, 2014

FWIW the old page still loads in lynx and w3m.

sirn · on Oct 29, 2014

My first impression: wow, looks minimal. This is a nice change. I like it! Then I read the comment here and realized the page hasn't fully loaded yet. The fact that above the fold part loads almost instantly (including top menu which worked right away) is very nice, but I'm not quite sure about the rest. It's slow, heavy and really hogs up CPU when it's loaded.

Mahn · on Oct 29, 2014

Oh, wow, had you not mentioned it I would have guessed that there was a part below the fold.

acdha · on Oct 29, 2014

Wow, thanks for noting that – I'm now watching the 4.7MB and counting HTML trickle in.

Throwaway12830 · on Oct 29, 2014

I went back after seeing your comment. This looks like Pinterest if it existed in the 90s.

Mithaldu · on Oct 29, 2014

The new archive.org is over 8MB of HTML, quite impressive.

Snark aside, i'd like to read a post about what they changed and why. For example i see they almost entirely removed the previously prominent links to the forums. Why is that?

aselzer · on Oct 29, 2014

Wow, this is due to them including more than 50 or so base64 encoded inline images.

They should really change this. The page took more than 8s to load (with cached images).

Mahn · on Oct 29, 2014

Huh. Why would anyone think that was a good idea.

snsr · on Oct 29, 2014

It results in fewer HTTP requests.

nknighthb · on Oct 29, 2014

It also results in a 33% byte overhead on top of the additional CPU and memory usage involved in actually pulling out and decoding base64 on the client side, and it screws up caching.

What should be a quick reload of this page results in the transfer of that 8MB all over again, and takes very nearly as long as the original load. There is no universe in which this was wise.

snsr · on Oct 29, 2014

All good points; I wasn't advocating, just observing :)

Turing_Machine · on Oct 30, 2014

Jiminy Christmas, that's slow.

Disclaimer: I'm on an unreliable hotel internet connection.

On the other hand, most of the world is on bad internet connections all the time.

Loading the images on mouseover (or whatever) would avoid having to wait for that massive slug of data.

I sure don't want to be loading that much data every time I go to the site, especially not on expensive mobile bandwidth, and especially since (as others noted) the images appear to be uncacheable.

So: nice job on the look, not so great on the bandwidth. :-)

maaarghk · on Oct 29, 2014

It does say on the top right that this is a beta site, so it's probably a weeee bit early to be making comments about "the new direction" and all that.

obviouslygreen · on Oct 29, 2014

I'd say it's exactly the opposite: Early on is exactly when comments about "the new direction" can most reasonably be expected to have an effect, assuming anyone is listening/reading.

pbhjpbhj · on Oct 29, 2014

Well beta means feature complete and considered ready to ship excepting as yet unspotted last minute bugs from testers ... so it's way too late to have input to the new direction, the new direction is at beta stage set in stone unless the update is scrapped.

vertex-four · on Oct 29, 2014

> Well beta means feature complete and considered ready to ship excepting as yet unspotted last minute bugs from testers

No, that's a release candidate. A beta is considered feature complete, but potentially not, and almost definitely somewhat buggy and/or non-performant. It's also often not undergone any major form of usability testing, and will often need to be modified to incorporate the results of that.

nknighthb · on Oct 29, 2014

On top of what the others said, it requires pointing out that you go to www.archive.org and get the new page. It's shipped. It is the public face now. You don't get to hide behind any sort of pre-release label when you've already made it the default in place of a prior working model.

textfiles · on Oct 29, 2014

If you go to the site cold, you do not get the new site. You get the old site. If, however, you've chosen to look at the new site, a cookie is set so you are looking at the new site. If you click on "exit beta" in the corner, it will go back to the old site and the old site after that.

So it's still an option.

jacquesm · on Oct 30, 2014

Please, please, pretty please keep the old site around in some form. It would be the irony of ironies if the archive would no longer be available in its old form.

nknighthb · on Oct 30, 2014

OK, well, that's... Actually worse.

Websites should not make changes to a user's settings based simply on a GET against an URL, because the user can click the URL from a random stranger not knowing it would have that effect -- as I did.

Aside from being a poor user experience, what has happened here is someone has deliberately coded a security vulnerability. Granted the impact here is basically trivial, but it's still a pretty classic no-no, and should have immediately set off flashing lights and klaxons in any security-conscious developer's mind.

textfiles · on Oct 30, 2014

I think the dev team would be surprised to hear that getting to see the new site is a security vulnerability!

nknighthb · on Oct 30, 2014

While I, sadly, am not at all surprised to hear that yet another group of developers treats security and user experience flippantly. I'm rather starting to regret defending you and lending resources to the twitpic grab.

asaddhamani · on Oct 29, 2014

I agree with the general sentiment of this thread. This redesign was not needed in the first place. The new website is incredibly slow to load, and the infinite scrolling thing absolutely sucks. While I can see all my uploads with the old(better) website, the new one doesn't seem to work. Just says "Fetching more results", and then nothing happens. With the tabs, if I try to list my uploads by, say, text, it reloads the page and switches the tab to collections, and then I have to click the Uploads tab again to see the filtered results. Same when I remove a filter.

Even on my workstation computer with a nice overclocked CPU, I can see the CPU usage jump to the top whenever I load the site. The website takes 19.90s to load with cache disabled, with ~200 requests and 6.5MB of data transferred. The older website takes 4.21s to load, with 18 requests and 280KB of data transferred, in contrast.

Meanwhile, features like the ability to playback WARC files that are uploaded by users don't seem to be getting any attention, but a feature like that would make so much sense for a site like the Internet Archive. I can see they provide a player for media files, why not provide something for WARC files too, then?

As a heavy user of the site, the redesign(at least at the current state) will only ever hinder my experience, I can't see it being helpful in any way.

db48x · on Oct 29, 2014

A WARC might have been put into the Wayback Machine; do a search there for the url. Failing that, there are proxies you can run locally that let you access the content of a WARC as if you were browsing the original site. https://github.com/internetarchive/warcprox is even by IA themselves, so clearly they're working on it.

A viewer that let you browse the contents of a WARC the same way you can do for a zip would be really nice, but it's probably a separate project from redesigning the site. In fact, browsing zip files (and a bunch of other similar file types, of course) was added only a few months ago.

asaddhamani · on Oct 30, 2014

True, the WARC file might have been included in the Wayback Machine, but I meant something different. I'm specifically interested in the ability to specify which WARC file I want to play back. I think an example would be better than me trying to explain it. Have a look at https://webrecorder.io, it lets you play back any WARC or ARC file, you just have to provide the URL.

I know this is outside the realm of a redesign, but if IA could add something like this, it would be a big UX improvement.

ANTSANTS · on Oct 29, 2014

This looks fancy and Web 2.0 and pinteresty and whatnot (and has the page size and CPU requirements to prove it), but it still has the fundamental problem of the old archive.org: it thinks portals are still relevant to the internet. They're not. One person or organization can't meaningfully organize all of the world's information, it's not worth trying. It's like saying "we're bringing the Dewey Decimal System into the 21st century." No, rigid hierarchical classification just doesn't cut it any more. Just focus on archiving the information, improving the search mechanism[1], and staying alive, and let the community handle curation and discovery.

[1] Merchanisms? Sorry to weeb out, but they should really look at Danbooru (NSFW) sometime. Tag-based classification and search works extremely well when (1) the users submitting the content aren't the people that made the content, so there's no conflict of interest encouraging them to try to game the system by spamming tags and whatnot, (2) there are strong guidelines for what is and isn't acceptable, what is and isn't subjective, and (3) the users are as dedicated and passionate as anime fans and archivists are. Let the users contribute objective tags, add support for subjective/personal tags ("pools" in booru lingo) that don't show up in search results but provide a way for users to curate, and for the love of god let them "fave" things and see their friend's favorites, and participation on archive.org would explode overnight.

domas · on Oct 29, 2014

You can provide feedback/comments directly to them by clicking "exit beta" in the top right.

textfiles · on Oct 29, 2014

Jason Scott, here. Disclaimer: I work for the Internet Archive (although I don't speak for the entire Archive) and I'm vaguely gung-ho on the place: https://twitter.com/textfiles/status/527549181175427072

I am also, as part of my job there, one of the largest individual uploaders of data to archive.org - I've added hundreds of thousands of individual items (texts, movies, music, websites) since I started working there in 2011.

So, moving on.

Welcome to the new user interface beta. I'm glad to see people toying with it, and the commentary and complaints are very, very welcome. As a smallish organization with a lot going on, the responses from people really digging down in the beta are very appreciated.

First, I'll say that the Beta interface is a "true" beta - it's the result of a lot of internal work, arguments, and discussions, but nothing is 100% set in stone. This isn't a beta like Gmail or a FPS trying to determine the rate of firing of the chaingun weapon: this is a lot of best-approach attempts at a whole range of goals. There's bound to be lots of responses from a lot of camps that are now coming forward. (For example, the new site has accessibility issues that need to be addressed.) If the term "beta" has been wrecked, stick with "prototype".

The internal name was V2, so I tend to keep calling it that.

V2 is the first major redesign of the main archive.org site in over a decade. And part of the conditions of this project (done by a handful of people) were to keep the old site (retroactively called V1) running, and mostly unchanged. That was a whole bucket of headache that isn't even obvious when you come into the site. (Anyone who has done this knows how it can be). With over 20 petabytes of data on the site, and millions of items and objects, spanning the whole environment without downtime is a feat in itself. So there's a whole range of philosophies being approached, but just getting the backend into a shape where it could sustain a new interface to it was a lot of non-obvious work.

Moving to the site as it is now.

Definitely slow. Definitely a shock. Definitely some great choices, some which might seem like head-scratchers. There is a designer, with a vision (his name is David) and there's been approaches to all the intended known shortcomings of V1 Internet Archive in this prototype.

One of the issues with Archive.org that's been an issue is non-responsiveness for different platforms - you got one site and that was it. Another was a lack of visual interface as an option. Now there is one.

The tagging and metadata efforts were spotty before now, because you were not really rewarded for doing so. The V2 site uses these tags and metadata extensively, and will continue to. This has been a nightmare for me, frankly - I've had to add logos to the 1,200 collections of items I've been uploading, and I'm doing descriptions as well as tags. But under the new system, the chance for finding things has increased exponentially.

There are definitely cases where I have to swap back to V1 to get kinds of "work done", because as an intense power-user, I do all sorts of grandiose work. But then again, 99% of my interaction with maintaining and adding content to the Internet Archive, I do through the API, and specifically through a python command-line interface we've had a developer working on for over a year:

https://pypi.python.org/pypi/internetarchive

I've uploaded many thousands of items, analyzed and upgraded their metadata, and done search-and-modify runs by the hundreds with this tool. It's being constantly updated.

In the future, I expect us to see multiple improvements to the interface - one which is much more bandwidth and processor friendly, a version of the "view" (we have image and list right now) that is optimum for researchers, and so on. But I'll stress again:

- This is a prototype which was done with a pretty small team who had to keep the old site running as smoothly as possible, while doing essentially a decade of upgrade in one swoop; - Now that it's "proven" that it works, refinement by the truckload needs to happen - Your comments are not just welcome but encouraged - Increased interest in the archive and the materials, and working together to find ways to access the petabytes of data in a meaningful way is not just a nice side benefit, but a vital core of the Archive's mission

Thanks for reading.

bane · on Oct 30, 2014

As an avid archive.org fan, I want to express my thanks to you and the team that runs it. It's a priceless jewel on the Internet and whenever I think about what it is, I'm usually awestruck.

I'm sure you're going to get lots and lots of complaints and criticisms and other feedback about the new site. Thanks for having an open mind about the feedback you're going to get. It's also fantastic that you guys are also users, impacted by these changes.

But my questions is, other than just providing feedback, what can we do to help? I've always been a consumer of the archive, but never a contributor, is there someway I can help, even if it's just cleaning metadata an hour or two a week?

laichzeit0 · on Oct 30, 2014

> But then again, 99% of my interaction with maintaining and adding content to the Internet Archive, I do through the API, and specifically through a python command-line interface we've had a developer working on for over a year

Although completely understandable, this goes against the principle of "eating your own dog food". If more people used the stuff they created, especially the power-user type of features, I suspect software would be orders of magnitude better in general.

textfiles · on Oct 30, 2014

I perhaps wasn't clear. There's two different major projects that I've been interacting with - the front UI design, and the python client for interacting with the Internet Archive's APIs.

I've been definitely "eating the dogfood" as regards the Python client for over a year, with dozens of changes, fixes and improvements being done.

I've been "eating the dogfood" in terms of the V2 UI by using it, and making additions and changes to my uploads to work better in the new interface, as well as submitting dozens of requests to the dev team about things I've encountered.

So I'm pretty full of dog food here.

walterbell · on Oct 29, 2014

Thanks for the additional detail.

> https://pypi.python.org/pypi/internetarchive

Can anyone use the API key, e.g. does it require auth for both upload and download? For upload, is an archive.org userid sufficient, or is a separate API key needed? Will the new metadata be available via the API?

Onsite searches are usually less successful than a Google search with site:archive.org. Within the archive, it has been near impossible to create a URL-based search query that will find all editions of a given title/work. Will the new site/tagging help?

Thanks to the entire archive team for a precious resource.

textfiles · on Oct 29, 2014

Anyone can generate an S3-like/API key. They have the same rights and restrictions as someone using other methodologies. So, for example, you can upload into the general Audio or Texts collections, but you can't upload, say, right into the Grateful Dead archive or the CD-ROM collections we have.

In the future, we hope to have it that accounts will be assigned and de-assigned by some credential different than the current somewhat-binary approach we have now, but that functionality doesn't exist yet.

So basically, upload is constricted like before. Download, however, is as unconstricted like before.

Mithrandir · on Oct 29, 2014

You don't need a key for downloading. For uploading, you use an access/secret key pair from https://archive.org/account/s3.php You then add that to the ia tool with "ia configure".

Curmudgel · on Oct 29, 2014

I got the old website when I visited it, but based on the image that another user posted:

http://i.imgur.com/m2d1Gf4.png

the text color of the description for search results is around #808080 on a #FFFFFF, which is a contrast ratio of 3.95 to 1 according to http://webaim.org/resources/contrastchecker/

This fails both WCAG AA and WCAG AAA recommendations. Please increase the contrast, preferably by leaving the text the way it was (#000000)

textfiles · on Oct 29, 2014

The team will see this thread. I'll leave it to them to take in the note.

danbee · on Oct 29, 2014

This is practically unusable in Safari on my MacBook Pro. It's a shame because once it actually loads it looks quite nice!

TomGullen · on Oct 29, 2014

I love archive.org. Has my first website ever on it, no way would this exist without archive.org any more!

http://web.archive.org/web/20020726224013/http://www.gamezon...

bgutj · on Oct 29, 2014

The only thing I've tested with the new beta interface is the ability to search all books for a particular word or phrase at the top level. This has not been implemented. If I am looking to write a biography about a particular person, for example, who is mentioned in passing in n books in the archive and I search for this person, I will find a very small amount of these books, perhaps even zero.

If I go to a particular book, I can search -inside- that book for words/phrases. I would like to do with the raw text from all books.

bane · on Oct 29, 2014

I with there was a word stronger than love for what I feel about archive.org. It's one of the amazing promises of the Internet come true.

If I have one criticism of archive.org it's that things are impossible to find, even if you know they have them - this redesign doesn't solve this problem.

I think the principle problem is that what should be a meta-layer on the organization, the provenance of a collection of stuff, is often used just as often as an organizational scheme as media and subject type.

And example. I'm looking to see if they have "A calendar of dinners, with 615 recipes" by Marion Harris Neil. Where would you suppose this book would be?

If I go to "eBooks and Texts" I'm simply met with a wall of collections, none of which are subject area organized, is it under Microfilm, or Canadian Libraries? Boston Library Consortium? Who knows? I'll never find it by browsing and the way books are collected is pretty much useless. Unless I know there's a copy under "Canadian Libraries" I'll probably not find it.

Sure I can search for it, "A calendar of dinners" gives me 3 results! Turns out it's buried under the following Archives:

"Toronto Public Library", "The Library of Congress", "Cornell University Library". Notice that none of these are the crumbtrail I used to find it the first time on accident (Canadian Libraries)!

How about Omni Magazine? Is it a "Text"? I'm not sure, even today. I do know if I go to texts and search all texts for "Omni" I get it back. But it's part of "The Magazine Rack" and "Additional Collections" which I still have not figured out how to just navigate to.

These are just texts, video, audio and other media types are similarly hard to navigate and find stuff. There's little pleasure in browsing archive because if you find something, it'll be by accident, not because you navigated to some pocket of cool stuff.

Good luck seeing what SF books they have and browsing it. That's actually a collection I'd care about.

I also like old radio shows, and those are scatter shot all over the site. Unless somebody basically just uploaded an entire series at once, good luck piecing it together.

Right now, about the only way I know something is on archive.org is because the person who uploaded the item mentions it on a podcast or something.

I'm almost tempted to just start a meta-website of some sort to start organizing stuff I care about to that other people like me can find it.

It's kind of a mess, and it's given me a lot more respect for what librarians have largely solved in the physical world.

db48x · on Oct 29, 2014

Interestingly enough, at the open house last night they talked about how they want to shift focus a bit and make it easier for people to do exactly that. They've gotten really good at storing things, and at digitizing them, and now they need to be better at letting people curate and organize collections.

Of course, as it stands anyone can make a website of their own that links to and/or embeds anything stored in the archive, but I gather that hasn't happened often.

bane · on Oct 29, 2014

> Of course, as it stands anyone can make a website of their own that links to and/or embeds anything stored in the archive, but I gather that hasn't happened often.

Yeah, and I've even debated it enough to think about doing it myself. For me at least, it feels kind of wrong to just put up a site that curates and organizes somebody else's collection. I'd also be worried about going through the effort and then having archive.org change the link-to urls or rules or whatever (even though they're a benevolent organization) and then have to go through all the effort again.

It really is a lot of work to find stuff on archive, even when you know it's there.

And again I attribute that to too much effort to keep track and give credit for the provenance of an item rather than organizing it in a reasonable way. As much as I'm glad that the Universal Library (Million Books Project) donated their work, it doesn't do anything for me as a user when I'm trying to find the Collected Stories of William Faulkner.

I think a better way would be to categorize the archive like any other library, and then for each individual work, provide alternative scans/recordings/transcriptions, etc. and a link to the donating organization that goes to a page that then gives you links to everything they donated.

But honestly, it's a fairly mild inconvenience. Most of the stuff they host is fairly long-dwell. Once I find a book or whatever, I'll be tied up with it for quite a few days and don't need to be bouncing all over their site several times an hour.

db48x · on Oct 29, 2014

Yes, probably they need to simply talk about this possibility more, or more prominently.

If you already know what you're looking for, then feel free to simply search for it; you don't have to browse through the categories it might be in, or the subjects it ought to be listed in, or any of that. Likely the only reason why it isn't in the places you looked is just that nobody has gotten around to applying the right metadata to it. This is simply a fact of life; there are millions (or billions, if you squint a little) of items in the archive, and most of them don't have all the metadata that they ought to have.

I wouldn't worry about stepping on their toes by organizing things better. It's not actually a collection until someone organizes and curates it; until that happens it's just a pile of stuff. IA has always operated on the assumption that it's ok to just have a pile of stuff if the alternative is to have nothing at all. Given the size of the pile there's no way they could ever organize everything themselves, even assuming that there's one obvious right way to organize things.

bane · on Oct 30, 2014

> IA has always operated on the assumption that it's ok to just have a pile of stuff if the alternative is to have nothing at all.

Absolutely. This definitely helps keep things in perspective. And their pile is priceless and staggering.

It's amazing to me that there are more things on IA to keep me entertained, for free, than I could ever possibly experience in this lifetime.

72deluxe · on Oct 29, 2014

Isn't infinite scrolling a memory hog?

acdha · on Oct 29, 2014

It's a risk but implementers can take the sting out of it. Browsers aren't currently smart enough to do things like unload decoded <img> memory for things which aren't visible but you can avoid the worst of it if you use a CSS background-image (which browsers do unload) and a visibility test on scroll to avoid loading things which aren't visible or soon to be visible. This works as far back as IE8 so it might be worth the hassle.

72deluxe · on Oct 30, 2014

Thank you - very informative. I would not have known the behaviour of browsers with regard to that - is there an established standard or accepted behaviour documented somewhere? I'm guessing that the CSS spec only dictates how things should be shown and not what browsers can do with off-screen items.

acdha · on Oct 30, 2014

Correct: the only way to know for sure is to test it. I think a few minor extensions could make that a lot easier – e.g. a CSS :visible selector and support for background-image: attr(…)

72deluxe · on Oct 31, 2014

Thank you. The massive amount of testing is surely what hampers web development? Years ago in the dark ages people dreamed of cross-platform apps; the three main players in this market (desktop only) now are Windows, Mac OSX and Linux (to a tiny extent). Writing a cross-platform native app is no longer a massive exercise in frustration (see wxWidgets or Qt).

Someone thought Java would solve all our problems, but it seems to have fallen out of fashion, and everyone pins their hopes and dreams on cross-platform apps with websites, but surely the effort is far larger: testing on 3+ browsers per OS at least!

I don't see how everyone copes.

taejo · on Oct 30, 2014

There's an escaping bug: at https://archive.org/details/movies I see "Arts & Music", etc.

The download links for movies (and I guess other files) should set their disposition to download rather than playing in the browser.

glomek · on Oct 29, 2014

Yikes! That's different!

But it still doesn't have the one feature that I've been wishing they would implement forever. In a collection of audio files, I wish they would provide a podcast feed. It would be so nice to be able to listen to Old Time Radio shows as podcasts.

aw3c2 · on Oct 29, 2014

That's a great idea, be sure to mail them about it at info@archive.org!

wj · on Oct 30, 2014

I like it. I was looking at it for thirty seconds and came across two collections that I didn't know existed. Great to discover new stuff and I can still search if I know what I'm looking for.

butwhy · on Oct 29, 2014

I've been waiting for something like this to happen, as the old design just looks old and doesn't encourage me to use it.

But.. I'm sure haters gonna hate.

userbinator · on Oct 29, 2014

If you're looking at the Internet Archive, then you should very well expect to see a lot of other "old" things besides the site itself.

davea37 · on Oct 29, 2014

Is the old one available anywhere? ;)

aw3c2 · on Oct 29, 2014

Just click "exit beta" in the top right corner and please leave feedback.

TheLoneWolfling · on Oct 29, 2014

How do I leave feedback?

aw3c2 · on Oct 29, 2014

If you click the exit beta button there is a form.

TheLoneWolfling · on Oct 29, 2014

There wasn't when I clicked it...?

Exit beta takes me straight to the old version without any form or anything showing.

textfiles · on Oct 29, 2014

If you aren't getting the form, you can write info@archive.org with any thoughts or comments you have and they'll be forwarded to the dev team, who are gleefully swimming in mail as we speak.

calinet6 · on Oct 29, 2014

Increase.. spacing... between top icons... and labels.. twitch

Sorry, designeritis.

droithomme · on Oct 29, 2014

The page here has 13.5MB in assets for the initial load, including a 7.9MB top html file. Takes 29.5 seconds to load, assuming all extensions and plugins like adblock are disabled, otherwise it seems to never complete loading. In both cases, it pins one CPU at 100% while it loads.

frik · on Oct 29, 2014

Direct link to the WayBack Machine:

https://archive.org/web/

ooooak · on Oct 29, 2014

> server: nginx/1.1.19 > Powered By: PHP/5.3.10-1ubuntu3.2

why 5.3 ? and ubuntu3.2 !!!

db48x · on Oct 30, 2014

Eh, that's not the version of Ubuntu that they are running; the whole thing is the php version number. There is a general convention among Linux distributions to backport security fixes to the older versions of software that come with their older releases.

In this case, Ubuntu 12.04 (Precise Pangolin) was released with PHP 5.3.10 plus some security patches, available in the Ubuntu package repository under the name php5 with the composite version number 5.3.10-1ubuntu3.14. Their website doesn't list a newer version of this package (http://packages.ubuntu.com/precise-updates/php5), so possibly they're ahead of the official Ubuntu releases.

The reasoning for this is that while it might be nice to upgrade in order to get new features, new bug fixes, and new performance enhancements, these potential benefits are often outweighed by the very real cost of testing everything to make sure the upgrade doesn't cause regressions. Backporting the security fixes makes sticking with a base version possible. I imagine that upgrading is pretty low on their list of things to do; it would have to get them some nice benefits, and nothing about php is ever nice.

I'm a software engineer myself, and I upgrade individual libraries far more often than I upgrade the actual programming-language runtime (or compiler), simply because that's where you get the most benefit (usually a fix for a specific bug, but sometimes a new feature will be tempting) for the least risk.