> Instead of multiple batches of promises, we fetch the user, items, and details...

tango12 · on Oct 9, 2023

Totally feel the pain. Ironically, most DBs have started supported really good json aggregation functions.

Combined with lateral joins, it’s possible to even push this ser/de to the DB directly and have minimal unpacking and packing in sending that data back to the client!

ElectricalUnion · on Oct 9, 2023

To me those advanced json aggregation functions are to handle the fact that most DBs accept de-normalized JSON fields.

Direct DB ser/de looks like a bad idea, it's often much easier to scale the application server layer that the DB layer on most cases.

And in most situations you want to "expand" the data as close as possible to the final location.

If you really care about having very low overhead ser/de you would probably go for protobuf/Cap'n Proto almost everywhere instead.

tango12 · on Oct 9, 2023

> it's often much easier to scale the application server layer that the DB layer on most cases.

Hm…ignoring the specific design decision here on using json agg - I think the open question is - is that specific sentiment practically still the truth in 2023? Is it worth scaling app servers or is it better to use Postgres well and scale a managed PG offering?

Do we need to spend time writing boring code when we could instead just use DBs better? And then instead we can spend time where it matters in the app server.

> And in most situations you want to "expand" the data as close as possible to the final location.

What do you mean by “expand”?

From a perf pov: Imagine you’re fetching 100 authors and each had 1000 articles, would you rather a) fetch mxn Cartesian product or b) ask the DB to send you a json aggregation where each author has a 1000 articles? There’s a significant db perf, intermediate layer ser/de, and network latency difference right?

sgarland · on Oct 9, 2023

> Imagine you’re fetching 100 authors and each had 1000 articles, would you rather a) fetch mxn Cartesian product or b) ask the DB to send you a json aggregation where each author has a 1000 articles?

I would be surprised if the DB performed better with an aggregation than just returning the results. Assuming you have reasonable indices, it’s a trivial join.

Network latency, maybe. Assuming 20 bytes per name, with no compression that’s an extra 2 MB.

You could write the query with a CONCAT as a subquery as a poor man’s aggregation, but then you’re very much at the mercy of the planner deciding to do all of that in one pass. I’m not sure that it would. On mobile, or else I’d check.

tango12 · on Oct 9, 2023

Like you said, it's not the join cost that's significant :)

But for the DB to be sending less data over the network means that the DB is doing less CPU work in helping process that data.

Benchmarking should be straightforward for a particular use-case.

ElectricalUnion · on Oct 9, 2023

> Is it worth scaling app servers or is it better to use Postgres well and scale a managed PG offering?

That is a very good question indeed, it probably depends on whatever compute or the DB offering is cheaper to scale for you; from my very anecdotal experience using both, usually DB is much more expensive to scale up.

> What do you mean by “expand”?

Getting more data thru the pipes, getting closer to the limit of the network pipes.

That doesn't sound like a realistic scenario to me - pulling Y articles from X authors.

If you're not pulling directly from articles, and you're pulling from X authors, X>1, you most likely don't actually want the articles, more likely you want instead some stat of each article like article score, subject, tags or comment counts. Then I don't see neither the JOIN expansion or the JSON ser/de being a serious issue at all compared to the DB retrieval costs.

DanielHB · on Oct 9, 2023

Most applications don't even need to optimise for N+1 queries and the ones that do could get by with just optimising it in a few places. Measure first then optimise as you see the need.

GraphQL batching together what would be multiple waterfall REST requests is such a perf boost for most cases that N+1 is really only an issue if your DB is bottlenecked.

bastawhiz · on Oct 9, 2023

I'd normally agree, but this is dangerous advice. Why? Because by the time you notice, it's not slow, it's down. N+1 queries don't scale badly as a function of users, it scales badly as a function of cumulative user behavior. One unexpectedly active user can make you self-DoS your database. Even a DB proxy like pgbouncer won't really help because parameterized queries are likely implemented with stored procedures, which require an exclusive DB connection, layering bottlenecks on bottlenecks.

SkyPuncher · on Oct 9, 2023

In my experience, you typically don’t just improve past slwo with n+1s.

Sure, if you introduce an n+1 to a large, existing table it will slow down. However, most endpoints are going to start at zero.

tango12 · on Oct 9, 2023

I think there are 2 subtleties here on where you optimize perf:

1. Frontend makes n+1 requests to the api. Either use GraphQL or build a custom endpoint that shapes a perfect response.

2. Build an api server that fetches data from the db efficiently without n+1 hitting the DB.

The whole tension introduced with resolver/data-loader ways of building GraphQL, is that while it makes 1 easy, it makes 2 harder. Compared to bespoke REST-ish endpoints.

DanielHB · on Oct 10, 2023

true, but ORMs also make 2. harder