Many (most?) "big content" sites let Google and Bing spiders scrape the contents of articles so when people search for terms in the article they'll find a hit and then get referred to the pay wall.
Google doesn't want everyone to know what a Google indexing request looks like for fear the CEO mafia will institute shenanigans. And the content providers (NYT, WaPo, etc.) don't want people to know 'cause they don't want people evading their paywall.
Or maybe they're okay with letting the archive index their content...
Just FYI google and bing publish their user agent strings[1][2] for the crawlers. At least in my experience most of the typical ad-infested and paywalled news sites wont display the paywall if you change the user agent to a crawler they prefer.
Google doesn't want everyone to know what a Google indexing request looks like for fear the CEO mafia will institute shenanigans. And the content providers (NYT, WaPo, etc.) don't want people to know 'cause they don't want people evading their paywall.
Or maybe they're okay with letting the archive index their content...