These models are so powerful. It's totally possible to build entire software pro...

simonw · 2026-02-19T18:21:46 1771525306

I had an interesting experience recently where I ran Opus 4.6 against a problem that o4-mini had previously convinced me wasn't tractable... and Opus 4.6 found me a great solution. https://github.com/simonw/sqlite-chronicle/issues/20

This inspired me to point the latest models at a bunch of my older projects, resulting in a flurry of fixes and unblocks.

jauntywundrkind · 2026-02-19T19:29:45 1771529385

From the project description here for your sqlite-chronicle project:

> Use triggers to track when rows in a SQLite table were updated or deleted

Just a note in case its interesting to anyone, sqlite compatible Turso database has CDC, a changes table! https://turso.tech/blog/introducing-change-data-capture-in-t...

small_model · 2026-02-19T19:44:12 1771530252

I have a codebase (personal project) and every time there is a new Claude Opus model I get it to do a full code review. Never had any breakages in last couple of model updates. Worried one day it just generates a binary and deletes all the code.

TZubiri · 2026-02-19T20:08:38 1771531718

No version control?

small_model · 2026-02-19T20:17:38 1771532258

I was being facetious, I mean one day models might skip the middle man of code and compilation and take your specs and produce an ultra efficent binary.

mikestorrent · 2026-02-19T22:00:25 1771538425

Musk was saying that recently but I don't see it being efficient or worthwhile to do this. I could be proven brutally wrong, but code is language; executables aren't. There's also no real reason to bother with this when we have quick-compiling languages.

More realistically, I could see particular languages and frameworks proving out to be more well-designed and apt for AI code creation; for instance, I was always too lazy to use a strongly-typed language, preferring Ruby for the joy of writing in it (obsessing about types is for a particular kind of nerd that I've never wanted to be). But now with AI, everything's better with strong types in the loop, since reasoning about everything is arguably easier and the compiler provides stronger guarantees about what's happening. Similarly, we could see other linguistic constructs come to the forefront because of what they allow when the cost of implementation drops to zero.

TZubiri · 2026-02-19T22:27:20 1771540040

You can map tokens to CPU instructions and train a model on that, that's what they do for input images I think.

I think the main limitation on the current models is not that cpu instructions aren't cpu instructions (even though they can be with .asm), it's that they are causal, the cpu would need to generate a binary entirely from start to finish sequentially.

If we learned something over the last 50 years of programming is that that's hard and that's why we invented programming languages? Why would it be simpler to just generate the machine code, sure maybe an LLM to application can exist, but my money is in that there will be a whole toolchain in the middle, and it will probably be the same old toolchain that we are using currently, an OS, probably Linux.

Isn't it more common that stuff builds on the existing infra instead of a super duper revolution that doesn't use the previous tech stack? It's much easier to add onto rather than start from scratch.

mikestorrent · 2026-02-19T23:00:21 1771542021

Those CPU instructions still need to be making calls out to things, though. Hallucinated source code will reveal its flaws through linters, compiler errors, test suites. A hallucinated binary will not reveal its flaws until it segfaults.

small_model · 2026-02-19T23:55:04 1771545304

Programs that pass linters, compile and test suites can still segfault. A good test harness that test the binary comprehensively can limit this. The model could be trained to have patterns of efficient assembly it uses rather than source code.

lurkshark · 2026-02-20T06:14:02 1771568042

I’ve thought an interesting outcome might be that it’s not even that there’s a binary generated. It’s just user input -> machine code LLM -> CPU. Like the only binary would be the LLM itself and it’s essentially mimicking software live. The paper “Diffusion as a Model of Environment Dream” (DIAMOND) is close to what I’m thinking, where they have a diffusion model generate frames of a game, updating with user input, but there’s no actual “game” code it’s just the model.

https://diamond-wm.github.io/

Like you’d have a machine code LLM that behaves like software but instead of a static binary being executed it’s just the LLM itself “executing” on inputs and precious state. I’m horrible at communicating this idea but hopefully the gist is there.

KoolKat23 · 2026-02-20T14:04:20 1771596260

Exactly this it serves little purpose.

You're going to need to spend crazy compute just compiling and obtaining training data. And until it's oneshotting absolutely everything. You're going to be asking it what it's it doing and then it'll be "uncompiling" it's code, I can't see this being more efficient than the other way compiling.

I suspect the actual benefit would be more in virtualised interfaces such as Genie 3, skipping this step altogether. Where it's just manipulating pixels and the pixels change based on the underlying statistical model output rather than old school computation.

poszlem · 2026-02-19T22:09:01 1771538941

This may seem obvious, but many people overlook it. The effect is especially clear when using an AI music model. For example, in Suno AI you can remaster an older AI generated track with a newer model. I do this with all my songs whenever a new model is released. It makes it super easy to see the improvements that were made to the models over time.

petesergeant · 2026-02-19T18:49:29 1771526969

I continue to get great value out of having claude and codex bound together in a loop: https://github.com/pjlsergeant/moarcode

apitman · 2026-02-19T19:05:25 1771527925

They are one, the ring and the dark lord

handfuloflight · 2026-02-20T12:39:43 1771591183

And there was many a chuckle at the Geminicide

nly · 2026-02-19T22:47:05 1771541225

I keep giving the top Anthropic, Google and OpenAI models problems.

They come up with passable solutions and are good for getting juices flowing and giving you a start on a codebase, but they are far from building "entire software products" unless you really don't care about quality and attention to detail.

nutjob2 · 2026-02-20T00:40:50 1771548050

That is my experience too. I don't know what others are building but the more novel the task is the worse these models perform.

nananana9 · 2026-02-20T07:13:07 1771571587

> I don't know what others are building

Don't ask a man about his salary, a woman about her age or an AI evangelist about results from their 1000x productivity boosted workflow.

jama211 · 2026-02-19T18:16:27 1771524987

Yeah I keep maintaining a specific app I built with gpt 5.1 codex max with that exact model because it continues to work for the requests I send it, and attempts with other models even 5.2 or 5.3 codex seemed to have odd results. If I were superstitious I would say it’s almost like the model that wrote the code likes to work on the code better. Perhaps there’s something about the structure it created though that it finds easier to understand…

seizethecheese · 2026-02-19T18:12:30 1771524750

> It feels like we are now able to manage incredibly smart engineers for a month at the price of a good sushi dinner.

In my experience it’s more like idiot savant engineers. Still remarkable.

cm2012 · 2026-02-19T23:04:16 1771542256

Its like getting access to an amazing engineer, but you get a new individual engineer each prompt, not one consistent mind.

worldsavior · 2026-02-19T18:19:05 1771525145

Sushy dinner? What are you building with AI, a calculator?

WarmWash · 2026-02-19T18:20:31 1771525231

I have long suspected that a large part of people's distaste for given models comes from their comfort with their daily driver.

Which I guess feeds back to prompting still being critical for getting the most out of a model (outside of subjective stylistic traits the models have in their outputs).

EugeneOZ · 2026-02-20T08:24:19 1771575859

You still need a human (working at human speed) to review every generated line, if it’s not a throwaway app or some demo to impress investors.

HardCodedBias · 2026-02-19T20:30:49 1771533049

"These models are so powerful."

Careful.

Gemini simply, as of 3.0, isn't in the same class for work.

We'll see in a week or two if it really is any good.

Bravo to those who are willing to give up their time to test for Google to see if the model is really there.

(history says it won't be. Ant and OAI really are the only two in this race ATM).