Surprisingly, nobody has mentioned this here yet. I’m thinking the key to this is SEO, SERP’s, and newspapers wanting Google to find and index their content.
This is my best guess for this. I’ve really put some thought into this, and this is the best logical assumption I’ve arrived to. I used to be a master of crawlers using Selenium years ago, but that burned me out a little bit so I moved on.
To test my hypothesis, you can go and find any article on Google that you know is probably paywalled. You click the content google shows you, and you navigate into the site, and “bam! Paywall!”..
If it has a paywall for me, well then how did Google crawl and index all the metadata for the SERP if it has a paywall?
I have a long running theory that Archive.is knows how to work around an SEO trick that Google uses to get the content. Websites like the Baltimore Sun don’t want humans to view their content for free, but they do want Googlebot to see it for free.
This is my best guess for this. I’ve really put some thought into this, and this is the best logical assumption I’ve arrived to. I used to be a master of crawlers using Selenium years ago, but that burned me out a little bit so I moved on.
To test my hypothesis, you can go and find any article on Google that you know is probably paywalled. You click the content google shows you, and you navigate into the site, and “bam! Paywall!”..
If it has a paywall for me, well then how did Google crawl and index all the metadata for the SERP if it has a paywall?
I have a long running theory that Archive.is knows how to work around an SEO trick that Google uses to get the content. Websites like the Baltimore Sun don’t want humans to view their content for free, but they do want Googlebot to see it for free.