Lesson Learned from Queries over 1.3T Rows of Data

ameyv · on Aug 21, 2019

Any database experts here? My question is should we use relational database for such use case? Quora like app I mean.

cryptonector · on Aug 21, 2019

Always use a relational database. Make sure to use one that supports recursive queries, and that doesn't suck.

If you need to shard your database, do that.

For example, say you're building a ridesharing app. You'll have databases of users and drivers, but you can totally shard the driver DB by city, and even the user DB can be sharded this way (copying into a shard an entry from a main truth db). The DB of available cars and rides can be in-memory only -- if you lose it you can let the client-side apps drive recovery. You can use things like FDW to get a view that looks like a single DB, and then you can write queries purely in SQL. Look ma', no hand-coded joins and such.

ameyv · on Aug 22, 2019

Hey Thanks. Do you know any resources or git repo with possible demonstration of these kinds of implementation? (Even what to search would be appreciated) I'm backend guy but average when it comes to database.

Thanks mate!

ecnahc515 · on Aug 21, 2019

I mean Stackoverflow is bigger than Quora and uses a relational database for storing most everything if I recall. They also use either memcached/redis (too lazy to check) to do caching of things pretty heavily though.

joncrane · on Aug 21, 2019

Their tech stack is incredible. They run an extremely lean operation. I'm an AWS and Linux fanboy and in fact have essentially staked 5-10 years of my career on AWS being the "go-to" technology. But if you have talent like Nick Craver and crew onboard, you have every reason to build and maintain your own hardware.

Not only that, it's all .NET/MSSQL. It's incredible technology and they are awesome at it.

https://nickcraver.com/blog/2016/03/29/stack-overflow-the-ha...

mrits · on Aug 21, 2019

It's actually a typical stack.

BubRoss · on Aug 22, 2019

What is so incredible about it?

joncrane · on Aug 22, 2019

The fact that they can serve their entire user base with so little hardware.

BubRoss · on Aug 22, 2019

But what are they doing that enables that other than putting standard components together and not introducing crazy bottlenecks? What part is incredible technology? Maybe modern computers and software just enable far more throughput than most people realize.

dkhenry · on Aug 21, 2019

I think you would be surprised about the scaleability of Relational Databases. Most of the large scale applications you use rely on Relational Databases, even at the largest scale. Modern distributed SQL system such as TiDB or the open source database Vitess are capable of serving any amount of traffic and they are really easy to use and very well understood.

In fact Quora uses a relational database

joatmon-snoo · on Aug 21, 2019

...Vitess is sharded on top of MySQL. It literally only exists because MySQL doesn't scale.

I will agree, though, that for the vast majority of use cases, worrying about using a so-called "scalable" DB is a YAGNI-type concern, and the normal options should be more than sufficient.

gregwebs · on Aug 21, 2019

Why not? I can think of one reason: SQL is generally poor at modeling tree structures. This may reflect itself in the application with comments without hierarchies (which is the case for StackOverflow and Quora), but that is a more minor part of the application. Similarly if you wanted to have a graph of relations between knowledge, graph databases might be better. But QA sites tend to just add tags to the questions, which is easy enough to model.

kthejoker2 · on Aug 31, 2019

The tradeoff is sharding and caching graph DBs is difficult in terms of the performance hit because it's harder to isolate queries to a single partition/subgraph - in some ways intersectionality is the whole point of graph vs relational.

The overhead of a large scale distributed graph DB with redundancy and caching is significantly higher than a relational DB acting as a "poor man's graph DB"; and materializing your more expensive views and queries is, in the base case, a sufficient performance equalizer.

carapace · on Aug 21, 2019

Blank page with JS disabled. Reader mode disabled.

jakeogh · on Aug 21, 2019

"Reader mode disabled."

That's a bug. Shouldnt be possible. Let the site require executing JS to get the txt, but the web browser should have no code to attempt to limit the user.... and yet they are _full_ of it.