One of the big open questions for me right now concerns how library dependencies...

sealeck · 2026-01-20T03:01:55 1768878115

I think the other question is how far away this is from a "working" browser. It isn't impossible to render a meaningful subset of HTML (especially when you use external libraries to handle a lot of this). The real difficulty is doing this (a) quickly, (b) correctly and (c) securely. All of those are very hard problems, and also quite tricky to verify.

I think this kind of approach is interesting, but it's a bit sad that Cursor didn't discuss how they close the feedback loop: testing/verification. As generating code becomes cheaper, I think effort will shift to how we can more cheaply and reliably determine whether an arbitrary piece of code meets a desired specification. For example did they use https://web-platform-tests.org/, fuzz testing (e.g. feed in random webpages and inform the LLM when the fuzzer finds crashes), etc? I would imagine truly scaling long-running autonomous coding would have an emphasis on this.

Of course Cursor may well have done this, but it wasn't super deeply discussed in their blog post.

I really enjoy reading your blog and it would be super cool to see you look at approaches people have to ensuring that LLM-produced code is reliable/correct.

simonw · 2026-01-20T03:09:11 1768878551

Yeah, I'm hoping they publish a lot more about this project! It deserves way more then the few sentences they've shared about it so far.

cousinbryce · 2026-01-20T16:00:31 1768924831

I’m interested to see how much more they know about the project

polyglotfacto · 2026-01-21T13:44:45 1769003085

I think the current approach is simply not scalable to a working browser ever.

To leverage AI to build a working browser you would imo need the following:

- A team of humans with some good ideas on how to improve on existing web engines.

- A clear architectural story written not by agents but by humans. Architecture does not mean high-level diagrams only. At each level of abstraction, you need humans to decide what makes sense and only use the agent to bang out slight variations.

- A modular and human-overseen agentic loop approach: one agent can keep running to try to fix a specific CSS feature(like grid), with a human expert reviewing the work at some interval(not sure how fine-grained it should be). This is actually very similar to running an open-source project: you have code owners and a modular review process, not just an army of contributor committing whatever they want. And a "judge agent" is not the same thing as a human code owner as reviewer.

Example on how not to do it: https://github.com/wilsonzlin/fastrender/blob/19bf1036105d4e...

This rendering loop architecture makes zero sense, and it does not implement web standards.

> in the HTML Standard, requestAnimationFrame is part of the frame rendering steps (“update the rendering”), which occur after running a task and performing a microtask checkpoint

> requestAnimationFrame callbacks run on the frame schedule, not as normal tasks.

This is BS: "update the rendering" is specified as just another task, which means it needs to be followed by a microtask checkpoint. See https://html.spec.whatwg.org/multipage/#event-loop-processin...

Following the spec doesn't mean you cannot optimize rendering tasks in some way vs other tasks in your implementation, but the above is not that, it's classic AI bs.

Understanding Web standards and translating them into an implementation requires human judgement.

Don't use an agent to draft your architecture; an expert in web standards with a interest in agentic coding is what is required.

Message to Cursor CEO: next time, instead of lighting up those millions on fire, reach out to me first: https://github.com/gterzian

ontouchstart · 2026-01-22T11:57:48 1769083068

How much effort would it take GenAI to write a browser/engine from scratch for GenAI to consume (and generate) all the web artifacts generated by human and GenAI? (This only needs to work in headless CI.)

How much effort would it take for a group of humans to do it?

polyglotfacto · 2026-01-24T09:46:15 1769247975

I'm not sure about what you mean with your first sentence in terms of product.

But in general, my guess at an answer(supported by the results of the experiment discussed on this thread), is that:

- GenAi left unsupervised cannot write a browser/engine, or any other complex software. What you end-up with is just chaos.

- A group of humans using GenAi and supervising it's output could write such an engine(or any other complex software), and in theory be more productive than a group of humans not using GenAi: the humans could focus on the conceptual bottlenecks, and the Ai could bang-out the features that require only the translation of already established architectural patterns.

When I write conceptual bottlenecks I don't mean standing in front of a whiteboard full of diagrams. What I mean is any work the gives proper meaning and functionality to the code: it can be at the level of an individual function, or the project as a whole. It can also be outside of the code itself, such as when you describe the desired behavior of (some part of) a program in TLA+.

For an example, see: https://medium.com/@polyglot_factotum/on-writing-with-ai-87c...

ontouchstart · 2026-01-24T13:30:20 1769261420

That is a wonderful write up.

“This is a clear indication that while the AI can write the code, it cannot design software”

To clarify what I mean by a product. If we want to design a browser system (engine + chrome) from scratch to optimize the human computer symbiosis (Licklider), what would be the best approach? Who should take the roles of making design decisions, implementation decisions, engineering decisions and supervision?

We can imagine a whole system with human out of the loop, that would be a huge unit test and integration test with no real application.

Then human can study it and learn from it.

Or the other way around, we had already made a huge mess of engineering beasts and machine will learn to fix our mess or make it worse by order of magnitude.

I don’t have an answer.

I used to be a big fan of TDD and now I am not, the testing system is a big mess by itself.

polyglotfacto · 2026-01-26T01:03:59 1769389439

> That is a wonderful write up.

Thanks.

> what would be the best approach?

I don't know but it sounds like an interesting research topic.

mwcampbell · 2026-01-20T09:54:23 1768902863

I was gratified to learn that the project used my own AccessKit for accessibility (or at least attempted to; I haven't verified if it actually works at all; I doubt it)... then horrified to learn that it used a version that's over 2 years old.

embedding-shape · 2026-01-20T10:17:09 1768904229

For me, the biggest open question is currently "How autonomous is 'autonomous'?" because the commits make it clear there were multiple actors involved in contributing to the repository, and the timing/merges make it seem like a human might have been involved with choosing what to merge (but hard to know 100%) and also making smaller commits of their own. I'm really curious to understand what exactly "It ran uninterrupted for one week" means, which was one of Cursor's claims.

I've reached out to the engineer who seemed to have run the experiment, who hopefully can shed some more light on it and (hopefully) my update to https://news.ycombinator.com/item?id=46646777 will include the replies and more investigations.

shubhamjain · 2026-01-20T05:07:14 1768885634

Why attempt something that has abundant number of libraries to pick and choose? To me, however impressive it is, 'browser build from scratch' simply overstates it. Why not attempt something like a 3D game where it's hard to find open source code to use?

Banditoz · 2026-01-20T05:10:46 1768885846

Is something like a 3D game engine even hard to find source code for? There's gotta lots of examples/implementations scattered around.

XenophileJKO · 2026-01-20T08:47:54 1768898874

There are a lot of examples out there. Funny that you mention this. I literally just last night started a "play" project having Claude Code build a 3D web assembly/webgl game using no frameworka. It did it, but it isn't fun yet.

I think the current models are at a capability level that could create a decent 3D game. The challenges are creating graphic assets and debugging/Qa. The debugging problem is you need to figure out a good harness to let the model understand when something is working, or how it is failing.

cheevly · 2026-01-20T07:45:18 1768895118

Assets are very hard to produce and largely unsolved by AI at the moment.

fulafel · 2026-01-20T20:09:01 1768939741

There's AI based 3d asset generation tools around. For example https://www.meshy.ai/ https://hyper3d.ai/ https://www.sloyd.ai/

qingcharles · 2026-01-20T16:47:54 1768927674

This is definitely correct. I had a dream about a new video game the other day, woke up and Gemini one-shotted the game, but the characters are janky as hell because it has made them from whole cloth.

What it should have been willing to do is go off and look for free external assets on the Web that it could download and integrate.

fulafel · 2026-01-20T20:04:51 1768939491

There's many open source ones around.

Also graphics acceleration makes it hard to do from scratch rather than using using the 3D APIs but I guess you could in principle go bare iron on hardware that has published specs such as AMD, or just do software only rendering.

janoelze · 2026-01-20T03:18:26 1768879106

Any views on the nature of "maintainability" shifting now? If a fleet of agents demonstrated the ability to bootstrap a project like that, would that be enough indication to you that orchestration would be able to carry the code base forward? I've seen fully llm'd codebases hit a certain critical weight where agents struggled to maintain coherent feature development, keeping patterns aligned, as well as spiralling into quick fixes.

simonw · 2026-01-20T03:26:58 1768879618

Almost no idea at all. Coding agents are messing with all 25+ years of my existing intuitions about what features cost to build and maintain.

Features that I'd normally never have considered building because they weren't worth the added time and complexity are now just a few well-structured prompts away.

But how much will it cost to maintain those features in the future? So far the answer appears to be a whole lot less than I would previously budget for, but I don't have any code more than a few months old that was built ~100% by coding agents, so it's way too early to judge how maintenance is going to work over a longer time period.

htrp · 2026-01-20T18:14:41 1768932881

I'm seeing a lot of duplication in our AI coded repos that is getting to the point of being problematic to maintain.

visarga · 2026-01-20T12:26:29 1768911989

> But how much will it cost to maintain those features in the future?

Very little if they have good specs and tests.

brianjeong · 2026-01-20T04:11:51 1768882311

I think there's a somewhat valid perspective that the Nth+1 model can simply clean up the previous models mess.

Essentially a bet that the rate of model improvement is going to be faster than the rate of decay from bad coding.

Now this hurts me personally to see as someone who actually enjoys having quality code but I don't see why it doesn't have a decent chance of holding

Deevian · 2026-01-20T17:02:03 1768928523

They demonstrated the ability to bootstrap... "something". There's no maintainability to the output of the experiment.

teaearlgraycold · 2026-01-20T05:21:02 1768886462

It looks like JS execution is outsourced to QuickJS?

simonw · 2026-01-20T13:07:57 1768914477

No, it has its own JS implementation: https://github.com/wilsonzlin/fastrender/tree/main/vendor/ec...

See also: https://news.ycombinator.com/item?id=46650998