> The theory I've heard is related to 'crawl budget'. Google is only going to devote a finite amount of time to indexing your site.
Once a site has been indexed once, should it really be crawled again? Perhaps Google should search for RSS/Atom feeds on sites and poll those regularly for updates: that way they don't waste time doing to a site scrape multiple times.
Old(er) articles, once crawled, don't really have to be babysat. If Google wants to double-check that an already-crawled site hasn't changed too much, they can do a statistical sampling of random links on it using ETag / If-Modified-Since / whatever.
The SiteMap, which was invented by Google and designed to give information to crawlers, already includes last-updated info.
No need to invent a new system based on RSS/Atom, there is already an actually existing and in-use system based on SiteMap.
So, what you suggest is already happening -- or at least, the system is already there for it to happen. It's possible Google does not trust the last modified info given by site owners enough, or for other reasons does not use your suggested approach, I can't say.
I can imagine a malicious actor changing an SEO-friendly page to something spammy and not SEO-friendly. Since E-Tag and If-Modified-Since are returned by the server, they can be manipulated.
Once a site has been indexed once, should it really be crawled again? Perhaps Google should search for RSS/Atom feeds on sites and poll those regularly for updates: that way they don't waste time doing to a site scrape multiple times.
Old(er) articles, once crawled, don't really have to be babysat. If Google wants to double-check that an already-crawled site hasn't changed too much, they can do a statistical sampling of random links on it using ETag / If-Modified-Since / whatever.