>Google processes 3.5 billion search queries per day, roughtly ten times more than its nearest competitors
The crazy thing to me is this has already been examined through many of their obviously anti-competitive acquisitions, and yet they always win. Namely, they use their position as the search engine leader to determine trends in search, purchases and advertising to buy companies in industries that will ultimately help the bottom line of their advertising revenue driven search engine to the detriment of alternative products.
In other words the very people/companies who already buy Google ads now must pay more for Google ads because Google entered the market and is competing for the same ads jacking up prices. Because of Google's search engine they know exactly where they can get away with this, they know the maximum ad cost bearable by their competitors and can/will squeeze it at best or they can effectively drive you out of business leaving themselves as the sole operator in the space at worst.
I mentioned it the other day, the one thing people could do is boycott google search, which I know sounds ridiculous, but how hard would it really be to organize a single day google search boycott across the web where people agreed to use bing or DuckDuckGo for a single day? google shareholders would throw a fit and change would be made internally in an instant, and I'd go so far as to say political pressure would come down to seriously look into the business practices. Most importantly people would wake up, understand their power and realize they own the web not these dominant corporate entities.
Good luck creating a distributed search engine that produces results that come even close in quality to Google's. Search engines are hard. There is a reason Google dominates, and its not just inertia.
Do you have actual evidence this is the case, or is it just supposition? I've been using DuckDuckGo for the last week and whenever I was unable to find something relevant I'd try the same query in Google and also come up short.
Even if Google is better overall, if you can use another search engine and get by, you are helping loosen their grip on their monopoly and give other search engines a path to improve. If you are unable to create a competing search engine yourself, that is at least one way you can help.
I also use DDG as my main search engine. For me at least the results are much worse than Google's, not only for obvious things where Google benefits from the additional information they have about me, for example localizations, but also for normal queries. Quite often some statcounter sites rank above the actual sites that I want.
Don't take that advice as discouragement to start.
Google did have to engineer a giant distributed system to provide the search results we get. There are articles that talked about how they just had motherboards connected to hard drives laying on anti-static pads filling rooms in buildings. The racks and cabinets didn't make sense for them.
Today we take for granted software stacks like lucene. Nothing like that existed when Google moved into the space.
So yeah, 15-20 years ago internet search was a really difficult problem. Now, the sheer volume of research available which specifically addresses the problem space of a trusted decentralized search, to me, means that a solution is imminent.
You could be the person that puts all the research blocks in the right order and launches a successful project.
A politically distributed system is a whole different beast.
People have built plenty of distributed systems where all endpoints are under the control of a single organization (or cooperating organizations). That's essentially a solved problem.
But a system which is not under a single control is a whole different approach to being distributed. Not only does A not control B, they may not intend to cooperate, and A does not necessarily have any reason to trust B or vice versa. B may even be an endpoint that is outright hostile to the network itself (ie, spam, ddos).
There have been very few successful distributed systems that follow that model. The WWW is one. That's the good news: If you manage to find a viable model of decentralization, AND it succeeds in user adoption, you have literally changed the world forever.
That's how hard it is: People who achieve it go down in history.
I see it as more of an incremental process than as an achievement. The Gnutella2 protocol by Michael Stokes is the most successfully deployed emergent and distributed search service that I am aware of. While G2 lacks any trust mechanisms there are a few projects working in the problem space of trusted decentralized stacks or "anarchitecture" that are trying to solve that for us.[1][2][3] All the building blocks seem to be in some state of existence and there are people experimenting by arranging them differently. The systems trust example you gave seems easily solvable after creating a durable decentralized trust model which manages identities and can verify trust while keeping privacy in mind. User key management seems like it will always be a problem.
The thing you mentioned is user adoption. Even if a decentralized Google was available there is no marketing machine behind it. There's no reason for people to adopt it. Patchwork is a good example of what organic growth looks like for a decentralized social app.
Developers are another facet of user adoption. Developers don't think about using these tools for new service development. They get paid to work on centralized architectures. The incentives aren't there yet.
It makes sense to me that this is the natural evolution of that way in which we build services. And yeah, it is super hard and we are all gonna be famous. ;)
> While G2 lacks any trust mechanisms there are a few projects working in the problem space of trusted decentralized stacks
Search engines need to solve the spam issue.
Without central control you lack any solid ground for eradicating spam, preventing fake accounts, and exploits to your protocols. Building trust is not easy. The only progress we made in the past decade was with Bitcoin, proof-of-work. Bittorrent relies on either a central tracker or the DHT(==not very secure).
For the past 12 years I've dedicated my academic career to building trust without any central control or central servers. It's hard.
<plug> We enhanced a Bittorrent client with a web-of-trust plus distributed torrent search and replaced tit-for-tat with a ledger. We using an incremental PageRank-like trust model. https://github.com/Tribler/tribler/wiki#current-items-under-...
> Tribler is the first client which continuously tries to improve upon the basic BitTorrent implementation by addressing some of the flaws. It implements, amongst others, remote search, streaming, channels and reputation-management. All these features are implemented in a completely distributed manner, not relying on any centralized component. Still, Tribler manages to remain fully backwards compatible with BitTorrent.
> Lengthy documentation in the form of two master thesis documents is available. First is a general documentation of the tunnel and relay mechanism, Anonymous HD video streaming, .pdf 68 pages. Second is focused on encryption part, called Anonymous Internet: Anonymizing peer-to-peer traffic using applied cryptography, .pdf 85 pages. In addition, there are the specifications for the protocols for anonymous downloading and hidden seeding on this wiki.
The link that you provided is going to take up a few weeks of my spare time. Thank you!
> a trusted decentralized search, to me, means that a solution is imminent.
Yes, something like Lucene didn't exist when Google Inc started in 1998 so the community could theoretically build a new search engine on it (or other similar open-source text indexing technology).
However, this overlooks the dynamic where Google will continue to innovate new algorithms for search results while the community is trying to "catch up" to Google with yesterday's technology (Lucene). Google isn't standing still.
This is similar to the dynamic where opensource Hadoop is not as good as Google's new "mapreduce" they use internally.[1] Same situation with GIMP that still doesn't have feature parity with Photoshop even though GIMP has been updated for 21 years.
That creates a feedback loop where users continue to use Google in 2027 because the open-source federated alternative is still "catching up" with Google 2017 and delivers worse results.
I think people underestimate what it takes to replicate Google's search intelligence. Something like Lucene only covers the text indexing. A major part of the software is the data center operations and coordination
processing pipeline. Maybe datacenter/cloud software like OpenStack might be leveraged to help coordinate a hundred thousand processing nodes but it's still far short of replicating what Google does.
>Even if a decentralized Google was available there is no marketing machine behind it. There's no reason for people to adopt it.
But google search didn't have marketing behind it either.
In 1999, they didn't run ads on Super Bowl or Wired Magazine. It's fast adoption was spread by word-of-mouth because everybody saw right away that it had better results than the junk returned by AltaVista, Excite, AOL, etc.
If the federated community search engine is truly better, people will use it.
All of your points are valid and a bit depressing. Google will always be ahead of the game in search. This will always be a problem for FOSS decentralized projects. How to overcome this will be a problem that is perhaps handed down to the next generation.
Maybe it's time to discuss how big is healthy for democracy. I think there is a case to be made that companies that get this big should be broken up into smaller ones. It's a tuning mechanism to keep competition alive and prevent power from getting too concentrated. It should also handle the too-big-to-fail problem we see in other domains.
Risks Posed by the Centralized Web
- Facebook and Google, account for 81% of all incoming traffic to online news sources in the U.S.
- Google processes 3.5 billion search queries per day, roughtly ten times more than its nearest competitors
Risk 1- Top-down, Direct Censorship
Risk 2 - Curational Bias/ Indirect Cencorship
Risk 3 - Abuse of Curatorial Power
Risk 4 - Exclusion