Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Instead of multiple batches of promises, we fetch the user, items, and details in one shot. This brings up the meta-question about why use GraphQL in the first place, but that’s a larger conversation for a separate time.

At my current job we have a TON of methods like the one just above this paragraph, and they're terrible to work with. It's hard to write (correctly), it's hard to test (correctly), and it is simply tedious to make work. I could write four lines of declarative GQL (that fetches exactly what I ask for from the DB) or a real mess of a method.

In practice, it's rarely the case that you're returning 10k of anything. To look at the author's benchmark, the difference between 20ms and 60ms for a far nicer and more maintainable dev experience (for 1000 returned items) is not a compelling argument.

Hell, if I look at our stack, the biggest cost is creating parameterized queries with many hundreds of IDs to select to prevent N+1 queries. I've spent countless hours optimizing the slowest code (that uses the pattern the author suggests) to use JOINs instead. Removing the round trip and the cost of shipping the list of IDs back to the DB consumed far more time than you might expect.

If you want to go down an even deeper rabbit hole, look at the cost of serializing your data to JSON and deserializing it on the client. When you're shipping tens of thousands of records to the client, that's likely hundreds of kilobytes or even megabytes of data. Serializing that is expensive!



Totally feel the pain. Ironically, most DBs have started supported really good json aggregation functions.

Combined with lateral joins, it’s possible to even push this ser/de to the DB directly and have minimal unpacking and packing in sending that data back to the client!


To me those advanced json aggregation functions are to handle the fact that most DBs accept de-normalized JSON fields.

Direct DB ser/de looks like a bad idea, it's often much easier to scale the application server layer that the DB layer on most cases.

And in most situations you want to "expand" the data as close as possible to the final location.

If you really care about having very low overhead ser/de you would probably go for protobuf/Cap'n Proto almost everywhere instead.


> it's often much easier to scale the application server layer that the DB layer on most cases.

Hm…ignoring the specific design decision here on using json agg - I think the open question is - is that specific sentiment practically still the truth in 2023? Is it worth scaling app servers or is it better to use Postgres well and scale a managed PG offering?

Do we need to spend time writing boring code when we could instead just use DBs better? And then instead we can spend time where it matters in the app server.

> And in most situations you want to "expand" the data as close as possible to the final location.

What do you mean by “expand”?

From a perf pov: Imagine you’re fetching 100 authors and each had 1000 articles, would you rather a) fetch mxn Cartesian product or b) ask the DB to send you a json aggregation where each author has a 1000 articles? There’s a significant db perf, intermediate layer ser/de, and network latency difference right?


> Imagine you’re fetching 100 authors and each had 1000 articles, would you rather a) fetch mxn Cartesian product or b) ask the DB to send you a json aggregation where each author has a 1000 articles?

I would be surprised if the DB performed better with an aggregation than just returning the results. Assuming you have reasonable indices, it’s a trivial join.

Network latency, maybe. Assuming 20 bytes per name, with no compression that’s an extra 2 MB.

You could write the query with a CONCAT as a subquery as a poor man’s aggregation, but then you’re very much at the mercy of the planner deciding to do all of that in one pass. I’m not sure that it would. On mobile, or else I’d check.


Like you said, it's not the join cost that's significant :)

But for the DB to be sending less data over the network means that the DB is doing less CPU work in helping process that data.

Benchmarking should be straightforward for a particular use-case.


> Is it worth scaling app servers or is it better to use Postgres well and scale a managed PG offering?

That is a very good question indeed, it probably depends on whatever compute or the DB offering is cheaper to scale for you; from my very anecdotal experience using both, usually DB is much more expensive to scale up.

> What do you mean by “expand”?

Getting more data thru the pipes, getting closer to the limit of the network pipes.

That doesn't sound like a realistic scenario to me - pulling Y articles from X authors.

If you're not pulling directly from articles, and you're pulling from X authors, X>1, you most likely don't actually want the articles, more likely you want instead some stat of each article like article score, subject, tags or comment counts. Then I don't see neither the JOIN expansion or the JSON ser/de being a serious issue at all compared to the DB retrieval costs.


Most applications don't even need to optimise for N+1 queries and the ones that do could get by with just optimising it in a few places. Measure first then optimise as you see the need.

GraphQL batching together what would be multiple waterfall REST requests is such a perf boost for most cases that N+1 is really only an issue if your DB is bottlenecked.


I'd normally agree, but this is dangerous advice. Why? Because by the time you notice, it's not slow, it's down. N+1 queries don't scale badly as a function of users, it scales badly as a function of cumulative user behavior. One unexpectedly active user can make you self-DoS your database. Even a DB proxy like pgbouncer won't really help because parameterized queries are likely implemented with stored procedures, which require an exclusive DB connection, layering bottlenecks on bottlenecks.


In my experience, you typically don’t just improve past slwo with n+1s.

Sure, if you introduce an n+1 to a large, existing table it will slow down. However, most endpoints are going to start at zero.


I think there are 2 subtleties here on where you optimize perf:

1. Frontend makes n+1 requests to the api. Either use GraphQL or build a custom endpoint that shapes a perfect response.

2. Build an api server that fetches data from the db efficiently without n+1 hitting the DB.

The whole tension introduced with resolver/data-loader ways of building GraphQL, is that while it makes 1 easy, it makes 2 harder. Compared to bespoke REST-ish endpoints.


true, but ORMs also make 2. harder




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: