The AI being wrong problem is probably not insurmountable.
Humans have meta-cognition that helps them judge if they're doing a thing with lots of assumptions vs doing something that's blessed.
Humans decouple planning from execution right? Not fully but we choose when to separate it and when to not.
If we had enough data on here's a good plan given user context and here's a bad plan, it doesn't seem unreasonable to have a pretty reliable meta cognition capability on the goodness of a plan.
You’re right in that it’s obviously not the only problem.
But without solving this seems like no matter how good the models get it’ll never be enough.
Or, yes, the biggest research breakthrough we need is reliable calibrated confidence. And that’ll allow existing models as they are to become spectacularly more useful.
The biggest breakthrough that we need is something resembling actual intelligence in AI (human or machine, I’ll let you decide where we need it more ;) )
Is it to build a copilot for a data analyst or to get business insight without going through an analyst?
If it’s the latter - then imho no amount of text to sql sophistication will solve the problem because it’s impossible for a non analyst to understand if the sql is correct or sufficient.
These don’t seem like text2sql problems:
> Why did we hit only 80% of our daily ecommmerce transaction yesterday?
> Why is customer acquisition cost trending up?
> Why was the campaign in NYC worse than the same in SF?
Correct, but I would propose two things to add to your analysis:
1. Natural language text is a universal input to LLM systems
2. text2sql makes the foundation of retrieving the information that can help answer these higher-level questions
And so in my mind, the goals for text2sql might be a copilot (near-term), but the long-term is to have a good foundation for automating text2sql calls, comparing results, and pulling them into a larger workflow precisely to help answer the kinds of questions you're proposing.
There's clearly much work needed to achieve that goal.
To be fair, these don’t look like SQL problems either. SQL answers “what”, not “why” questions. The goal of text2sql is to free up analyst time to get through “what” much faster and - possibly- focus on “why” questions.
My observation is the latter, but I agree the results fall short of expectations. Business will often want last minute change in reporting, don't get what they want at the right time because lack of analysts, and hope having "infinite speed" will solve the problem.
But ofc the real issue is that if your report metrics change last minute, you're unlikely to get good report. That's a symptom of not thinking much about your metrics.
Also, reports / analysis generally take time because the underlying data are messy, lots of business knowledge encoded "out of band", and poor data infrastructure. The smarter analytics leaders will use the AI push to invest in the foundations.
Any algo that a human would follow can be built and tested. If you have 10 analysts you have 10 different skill levels, with differing understanding of the database and business context. So automation gives you a platform to achieve a floor of skill and knowledge. The humans can now be “at least this good or better”. A new analyst instantly gets better, faster.
I assume a useful goal would be to guide development of the system in coordination with experts, test it, have the AI explain all trade offs, potential bugs, sense check it against expected results etc.
Taste is hard to automate. Real insight is hard to automate. But a domain expert who isn’t an “analyst” can go extremely far with well designed automation and a sense of what rational results should look like. Obviously the state of the art isn’t perfect but you asked about goals, so those would be my goals.
The processes the people want the sql for are likely filled with algo’s. An exec wants info in a known domain, set up a text to sql system with lots of context and testing to generate queries. If they think they have something good, get an expert to test and productionise it.
“Thank you for your request. Can you walk me through the steps you’d use to do this manually? What things would you watch out for? What kind of number ranges are reasonable? I can propose an algorithm and you tell me if that’s correct. The admins have set up guidelines on how to reason about customer and purchase data. Is the following consistent with your expectations?”
This is the same fallacy as low-code/no-code. If you have to check a precise algorithm, you’re effectively coding, and you need a language with the same precision as a programming language.
Only if you want a production-ready output. To get execs able to self-feed enough, this works fine. Look, you don’t see value until it’s perfect. Good, other people do. I see your fallacy and raise you a false dichotomy.
The problem I see is how do you verify that the result of your text-to-sql is really what you were asking for, without understanding the SQL (or “the algorithm”)? It boils down to that you have to know what you are doing, and with the present state of art of AI we can’t have confidence in that.
I’m assuming exploratory work from the exec, not something they make decisions with or put into production. If you need something you can trust, you typically need a lot of checks, including multiple humans.
I play a weird part at work near AI. I use it all the time but I’m the first person to warn everyone that it’s absolutely not trustworthy. No matter your prompt, the data, the guidelines built into it, the output is fundamentally flaky. But I use it while knowing that and working around it. Making the process reliable is a big part of my focus, and usually that means minimising the part the LLM plays. Checks and balances live where things are predictable.
I totally agree with the sentiment here, but disagree with the conclusion.
1. The rest of the world doesn't have facebooks data layer. Ergo, GraphQL is very hard for everyone who's not FB.
2. Turns out GraphQL is actually a really good data API. Whether it's a postgres/mongo/graph/REST data source. Because it's a great mid-point between something as flexible as SQL but as controlled as a REST API.
Here are 3 things that GraphQL is absolutely great at:
- Select a 2 fields out of a 100 attributes in a data model? GraphQL is great.
- Gradually deprecate an old field in a database and replace with a new one? GraphQL is great.
- Want a flexible way to filter/join/paginate regardless of the underlying storage layer? GraphQL is great.
Other API formats suck at this.
Working with GraphQL ought to feel like you're working entirely with your database and sprinking the right bits of transform/validation/authz business logic and then the API that other teams can use is "free".
GraphQL is dying as a replacement to a REST API layer. I think GraphQL will see its second wind as a data API. Microsoft and Google announced their GraphQL data APIs recently.
GraphQL will probably only make sense when it's as close as a 1:1 map to a DB beneath it. Whether relational or NoSQL or graph.
In all other cases, the GraphQL juice is not worth the squeeze.
I’m the founder of Hasura - sharing some notes from how I’ve seen GraphQL usage evolve over the last few years.
1. GraphQL was and remains insanely hard to build without an underlying data layer that does the heavy lifting. Without projection push-down, predicate push-down, a data layer that can support heavy parallelism it’s untenable. Exactly the problems the OP highlights - needing to “hoist”…
2. GraphQL on REST is an anti-pattern. The hype takes you there, but the juice is not worth the squeeze. The problem was lack of types? Add openapi. The problem was custom aggregation endpoints? Make it super easy / cheap to build aggregate endpoints in REST. Use an AI copilot to write an aggregate REST endpoint and a openapi schema for it even. But maybe the last thing to do is to build annd mainatian another API layer with a completely different execution model.
It’s not what it was meant for perhaps, but GraphQL’s killer use-case is an API to access / operate on data. Especially when there are multiple consumers of data (services, apps) and they don’t own the underlying data sources or speak the underlying language of the database. This turns out to be the underlying problem behind a lot of api, data modernization / migration / decomposition efforts.
The cost of this approach is implementing
Graphql by building a compiler/planner with a robust authz system baked into it. A resolver based approach can never cut it, unless you resolve a query plan - aka build a compiler.
If you want to use graphql to just batch rest endpoints, it’s very likely going to become legacy tech as soon as the team that built it starts to move on.
Yates implements Postgres RLS along with Prisma, which we use as our ORM, and then our GraphQL schema is generated using TypeGraphQL (https://typegraphql.com/). Overall, it's a very nice setup and allows us to be versatile on the client side while still having strong authentication integrity.
While perhaps not as polished as Hasura, this stack does have the nice benefit of being free ;-)
I think there are 2 subtleties here on where you optimize perf:
1. Frontend makes n+1 requests to the api. Either use GraphQL or build a custom endpoint that shapes a perfect response.
2. Build an api server that fetches data from the db efficiently without n+1 hitting the DB.
The whole tension introduced with resolver/data-loader ways of building GraphQL, is that while it makes 1 easy, it makes 2 harder. Compared to bespoke REST-ish endpoints.
The problem is that it’s hard to get the articles resolver make one query to fetch all the recent articles. So you’ll end up with n+1 fetches from the db.
In the articles resolver, you have to:
Select from articles order by created where article.user_id = X limit 5.
And you’ll have to run this statement n times - once for each user.
Not sure how a data-fetching cache layer at the app server layer will help here.
@alex_lav: Your SQL query here is exactly what I meant!
In SQL, this is easy, both to write and to optimize perf.
Implementing this with GraphQL resolvers (how folks typically write GraphQL servers) is hard - there's a users function and an articles function that are both called, so it's hard to implement those 2 functions contributing together towards a query plan. This was my point.
In fact a REST endpoint that takes a list of users and runs this query with the `IN` part parameterized is easier to build here. But the GraphQL version is a pain.
Sorry, I feel like there must be something I'm not understanding about the limitation you're trying to convey.
WITH relevant_users AS (
SELECT id FROM users WHERE id IN (1)
)
SELECT users.name, mra.title
FROM users
INNER JOIN most_recent_articles mra
ON users.id = mra.user_id
INNER JOIN relevant_users
ON users.id = relevant_users.id;
This is a single query that can fetch users filtered by the First: (although I just did in, you could add whatever filtering options you wanted in the relevant_users cte) with the most recent articles view we've discussed previously.
Totally feel the pain. Ironically, most DBs have started supported really good json aggregation functions.
Combined with lateral joins, it’s possible to even push this ser/de to the DB directly and have minimal unpacking and packing in sending that data back to the client!
> it's often much easier to scale the application server layer that the DB layer on most cases.
Hm…ignoring the specific design decision here on using json agg - I think the open question is - is that specific sentiment practically still the truth in 2023? Is it worth scaling app servers or is it better to use Postgres well and scale a managed PG offering?
Do we need to spend time writing boring code when we could instead just use DBs better? And then instead we can spend time where it matters in the app server.
> And in most situations you want to "expand" the data as close as possible to the final location.
What do you mean by “expand”?
From a perf pov: Imagine you’re fetching 100 authors and each had 1000 articles, would you rather a) fetch mxn Cartesian product or b) ask the DB to send you a json aggregation where each author has a 1000 articles? There’s a significant db perf, intermediate layer ser/de, and network latency difference right?
> Imagine you’re fetching 100 authors and each had 1000 articles, would you rather a) fetch mxn Cartesian product or b) ask the DB to send you a json aggregation where each author has a 1000 articles?
I would be surprised if the DB performed better with an aggregation than just returning the results. Assuming you have reasonable indices, it’s a trivial join.
Network latency, maybe. Assuming 20 bytes per name, with no compression that’s an extra 2 MB.
You could write the query with a CONCAT as a subquery as a poor man’s aggregation, but then you’re very much at the mercy of the planner deciding to do all of that in one pass. I’m not sure that it would. On mobile, or else I’d check.
> Is it worth scaling app servers or is it better to use Postgres well and scale a managed PG offering?
That is a very good question indeed, it probably depends on whatever compute or the DB offering is cheaper to scale for you; from my very anecdotal experience using both, usually DB is much more expensive to scale up.
> What do you mean by “expand”?
Getting more data thru the pipes, getting closer to the limit of the network pipes.
That doesn't sound like a realistic scenario to me - pulling Y articles from X authors.
If you're not pulling directly from articles, and you're pulling from X authors, X>1, you most likely don't actually want the articles, more likely you want instead some stat of each article like article score, subject, tags or comment counts. Then I don't see neither the JOIN expansion or the JSON ser/de being a serious issue at all compared to the DB retrieval costs.
IMHO: The way we often approach GraphQL execution is a little dated. It’s perhaps legacy thinking from times when we had data layers that couldn’t be expected to scale even for the majority of use cases. But that has completely changed over the last 5-10 years - DBs can scale pretty massively, for specific workloads with very little work required at the app server layer. So if we used DBs to actually model as close to our domain entities as possible (and yes, we’ll need multiple DBs sooner or later), then a huge part of our work is already technically done.
So - here’s another way to approach building GraphQL:
1. Build a “read graph” so that you get a highly composable API that can almost entirely be pushed down into a data layer (eg: Postgres). This would involve techniques similar to building a compiler. Which is hard, but the great news is that it’s easy to make reusable across people who share the same data layer (Eg: Postgres).
There are only 2 business logic concerns at layer, both of which can be pushed down, which helps simplify the problem.
1) modelling/transformation: which can be done in the language of the data layer and pushed down (like views)
2) authz: which can also be done in the language of the data layer and pushed down (like rls)
2. The “command graph”: a set of controllers written in whatever language you please that makes changes to the data layer (or delegates to an orchestration system) and returns a reference to the read graph after having made changes.
This is kind of how we approach the problem at Hasura - a connector SDK that makes it easy to build reusable query compilers for different DBs or API services.
And it’s been great to see similar ideas in the ecosystem - let’s make it easy to execute a tree of functions of create a query plan, instead of executing a tree of data fetch functions (like the OP). Grafast by the postgraphile folks comes to mind.
Humans have meta-cognition that helps them judge if they're doing a thing with lots of assumptions vs doing something that's blessed.
Humans decouple planning from execution right? Not fully but we choose when to separate it and when to not.
If we had enough data on here's a good plan given user context and here's a bad plan, it doesn't seem unreasonable to have a pretty reliable meta cognition capability on the goodness of a plan.