Taffy a solid library choice, but it's probably the most robust ammunition for anyone who wants to argue that this shouldn't count as a "from scratch" rendering engine.
I don't think it detracts much if at all from FastRender as an example of what an army of coding agents can help a single engineer achieve in a few weeks of work.
I think the other question is how far away this is from a "working" browser. It isn't impossible to render a meaningful subset of HTML (especially when you use external libraries to handle a lot of this). The real difficulty is doing this (a) quickly, (b) correctly and (c) securely. All of those are very hard problems, and also quite tricky to verify.
I think this kind of approach is interesting, but it's a bit sad that Cursor didn't discuss how they close the feedback loop: testing/verification. As generating code becomes cheaper, I think effort will shift to how we can more cheaply and reliably determine whether an arbitrary piece of code meets a desired specification. For example did they use https://web-platform-tests.org/, fuzz testing (e.g. feed in random webpages and inform the LLM when the fuzzer finds crashes), etc? I would imagine truly scaling long-running autonomous coding would have an emphasis on this.
Of course Cursor may well have done this, but it wasn't super deeply discussed in their blog post.
I really enjoy reading your blog and it would be super cool to see you look at approaches people have to ensuring that LLM-produced code is reliable/correct.
I think the current approach is simply not scalable to a working browser ever.
To leverage AI to build a working browser you would imo need the following:
- A team of humans with some good ideas on how to improve on existing web engines.
- A clear architectural story written not by agents but by humans. Architecture does not mean high-level diagrams only. At each level of abstraction, you need humans to decide what makes sense and only use the agent to bang out slight variations.
- A modular and human-overseen agentic loop approach: one agent can keep running to try to fix a specific CSS feature(like grid), with a human expert reviewing the work at some interval(not sure how fine-grained it should be). This is actually very similar to running an open-source project: you have code owners and a modular review process, not just an army of contributor committing whatever they want. And a "judge agent" is not the same thing as a human code owner as reviewer.
This rendering loop architecture makes zero sense, and it does not implement web standards.
> in the HTML Standard, requestAnimationFrame is part of the frame rendering steps (“update the rendering”), which occur after running a task and performing a microtask checkpoint
> requestAnimationFrame callbacks run on the frame schedule, not as normal tasks.
Following the spec doesn't mean you cannot optimize rendering tasks in some way vs other tasks in your implementation, but the above is not that, it's classic AI bs.
Understanding Web standards and translating them into an implementation requires human judgement.
Don't use an agent to draft your architecture; an expert in web standards with a interest in agentic coding is what is required.
Message to Cursor CEO: next time, instead of lighting up those millions on fire, reach out to me first: https://github.com/gterzian
How much effort would it take GenAI to write a browser/engine from scratch for GenAI to consume (and generate) all the web artifacts generated by human and GenAI? (This only needs to work in headless CI.)
How much effort would it take for a group of humans to do it?
I'm not sure about what you mean with your first sentence in terms of product.
But in general, my guess at an answer(supported by the results of the experiment discussed on this thread), is that:
- GenAi left unsupervised cannot write a browser/engine, or any other complex software. What you end-up with is just chaos.
- A group of humans using GenAi and supervising it's output could write such an engine(or any other complex software), and in theory be more productive than a group of humans not using GenAi: the humans could focus on the conceptual bottlenecks, and the Ai could bang-out the features that require only the translation of already established architectural patterns.
When I write conceptual bottlenecks I don't mean standing in front of a whiteboard full of diagrams. What I mean is any work the gives proper meaning and functionality to the code: it can be at the level of an individual function, or the project as a whole. It can also be outside of the code itself, such as when you describe the desired behavior of (some part of) a program in TLA+.
“This is a clear indication that while the AI can write the code, it cannot design software”
To clarify what I mean by a product. If we want to design a browser system (engine + chrome) from scratch to optimize the human computer symbiosis (Licklider), what would be the best approach? Who should take the roles of making design decisions, implementation decisions, engineering decisions and supervision?
We can imagine a whole system with human out of the loop, that would be a huge unit test and integration test with no real application.
Then human can study it and learn from it.
Or the other way around, we had already made a huge mess of engineering beasts and machine will learn to fix our mess or make it worse by order of magnitude.
I don’t have an answer.
I used to be a big fan of TDD and now I am not, the testing system is a big mess by itself.
I was gratified to learn that the project used my own AccessKit for accessibility (or at least attempted to; I haven't verified if it actually works at all; I doubt it)... then horrified to learn that it used a version that's over 2 years old.
For me, the biggest open question is currently "How autonomous is 'autonomous'?" because the commits make it clear there were multiple actors involved in contributing to the repository, and the timing/merges make it seem like a human might have been involved with choosing what to merge (but hard to know 100%) and also making smaller commits of their own. I'm really curious to understand what exactly "It ran uninterrupted for one week" means, which was one of Cursor's claims.
I've reached out to the engineer who seemed to have run the experiment, who hopefully can shed some more light on it and (hopefully) my update to https://news.ycombinator.com/item?id=46646777 will include the replies and more investigations.
Why attempt something that has abundant number of libraries to pick and choose? To me, however impressive it is, 'browser build from scratch' simply overstates it. Why not attempt something like a 3D game where it's hard to find open source code to use?
There are a lot of examples out there. Funny that you mention this. I literally just last night started a "play" project having Claude Code build a 3D web assembly/webgl game using no frameworka. It did it, but it isn't fun yet.
I think the current models are at a capability level that could create a decent 3D game. The challenges are creating graphic assets and debugging/Qa. The debugging problem is you need to figure out a good harness to let the model understand when something is working, or how it is failing.
This is definitely correct. I had a dream about a new video game the other day, woke up and Gemini one-shotted the game, but the characters are janky as hell because it has made them from whole cloth.
What it should have been willing to do is go off and look for free external assets on the Web that it could download and integrate.
Also graphics acceleration makes it hard to do from scratch rather than using using the 3D APIs but I guess you could in principle go bare iron on hardware that has published specs such as AMD, or just do software only rendering.
Any views on the nature of "maintainability" shifting now? If a fleet of agents demonstrated the ability to bootstrap a project like that, would that be enough indication to you that orchestration would be able to carry the code base forward? I've seen fully llm'd codebases hit a certain critical weight where agents struggled to maintain coherent feature development, keeping patterns aligned, as well as spiralling into quick fixes.
Almost no idea at all. Coding agents are messing with all 25+ years of my existing intuitions about what features cost to build and maintain.
Features that I'd normally never have considered building because they weren't worth the added time and complexity are now just a few well-structured prompts away.
But how much will it cost to maintain those features in the future? So far the answer appears to be a whole lot less than I would previously budget for, but I don't have any code more than a few months old that was built ~100% by coding agents, so it's way too early to judge how maintenance is going to work over a longer time period.
Most of the big ones are things like skia, harfbuzz, wgpu - all totally reasonable IMO.
The two that stand out for me as more notable are html5ever for parsing HTML and taffy for handling CSS grids and flexbox - that's vendored with an explanation of some minor changes here: https://github.com/wilsonzlin/fastrender/blob/19bf1036105d4e...
Taffy a solid library choice, but it's probably the most robust ammunition for anyone who wants to argue that this shouldn't count as a "from scratch" rendering engine.
I don't think it detracts much if at all from FastRender as an example of what an army of coding agents can help a single engineer achieve in a few weeks of work.