Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

TL;DR: I strongly suspect that relatively small, personally-curated lists will be much more appropriate and highly effective. These might be augmented with specific classifications, but probably not on a widespread basis.

Though the proposed solution borrows heavily from concepts long used in email and Usenet spam, there are a few critical distinction in SEO SERP[1] spam which both make a widely-crowdsourced listing less applicable and less necessary.

In the case of email, your inbox is an unlimited resource to the spammers --- there's effectively no limit to how much spam they can throw at it. As there are also an effectively limitless set of source addresses (by either domain name or IPv6 addresses), and because email/Usenet spam is itself a quantity/numbers game with rapidly shifting origins, collectively-source and curated blocklists have value.[2]

A SERP is itself a finite resource --- the default is to display 10 results, and not making it into the top ten provides little reward. Moreover, high ranking search takes some effort and time to achieve, it's not like in email where a new server can spin up and immediately start deluging targets.

My experience with annoyances matching this sort (stream-based social media is one example) is that blocking a relatively small number of high-profile annoyances hugely improves signal/noise ratios. And I think that will be the case with SERPs as well. There are a half-dozen or so sites which tend to dominate results in most cases, and those can be individually blocklisted (if the capability exists). If more appear, they can similarly be removed.

The other factor is that quite a few sites which some people find exceedingly annoying and spammish, others find appealing. Coming to agreement on what to block, and classifications of such domains / sites, is likely to be difficult and/or contentious. There may be exceptions in specific instances (hence: specific classifications of unwanted results), but less so in the general case.

I might be wrong. The case of DNS adbloc, with PiHole as the classic example, shows that very large lists can be compiled and used. My own Web adblock / malware block configurations have typically had from ~10k to ~100k of thousands of entries. That said, the really heavy lifting is typically done by a much smaller fraction of the total. Power laws and Zipf functions work in your advantage here.

________________________________

Notes:

1. Search engine results page, that is, what you see in response to a query.

2. Even in the case of email spam, the principle value is largely from curated lists, usually by experts, e.g., Spamhaus.



In that case, we'd still need a repo bringing together all these individual lists. I couldn't finding anything like this.

I suggested a single list to prevent repetition and to limit the imports one needs to make to one.


So, my larger point is that no, that repo doesn't seem to be called for.

For malware and the like, repurposing extant DNS-based blocklists as in for uBlock Origin / uMatrix should be viable, and not require an additional curation effort.

Note also that we're looking at a browser extension, and as such, very large lists and memory load would probably carry significant negative impacts.


I've been building something similar to this with https://fantastic.link. Would love to get your feedback!

I think empowering individuals to curate the web would create stronger social and financial incentives to improve online indexing (I.e: Shopify vs Amazon). 20 years ago we could approximate quality from backlinks from credible sites, in the age of social media it seems this signal has shifted towards what creators, influencers, and online experts endorse.


I'm not seeing the relationship here.

ELI5?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: