Hacker Newsnew | past | comments | ask | show | jobs | submit | wilsonzlin's commentslogin

I've responded to this claim in more detail at [0], with additional context at [1].

Briefly, the project implemented substantial components, including a JS VM, DOM, CSS cascade, inline/block/table layout, paint systems, text pipeline, and chrome, and is not merely a Servo wrapper.

[0] https://news.ycombinator.com/item?id=46650998

[1] https://news.ycombinator.com/item?id=46655608


Just for context, this was the original claim by Cursor's CEO on Twitter:

> We built a browser with GPT-5.2 in Cursor. It ran uninterrupted for one week.

> It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.

> It kind of works! It still has issues and is of course very far from Webkit/Chromium parity, but we were astonished that simple websites render quickly and largely correctly.

https://xcancel.com/mntruell/status/2011562190286045552#m


Could you somewhere make clear exactly how much of the code was "autonomously" built vs how much was steered by humans? Because at this point it's clear that it wasn't 100% autonomous as originally claimed, but right now it's not clear if this was just the work of an engineer running Cursor vs "autonomously organised a fleet of agents".

I cannot make these two statements true at the same time in my head:

> Briefly, the project implemented substantial components, including a JS VM

and from the linked reply:

> vendor/ecma-rs as part of the browser, which is a copy of my personal JS parser project vendored to make it easier to commit to.

If it's using a copy of your personal JS parser that you decided it should use, then it didn't implement it "autonomously". The references you're linking don't summarize to the brief you've provided.

What the fuck is going on?


It's funny how their whole grift hinges on people not reading clearly.

Did you actually review these implementations and compare them to Servo (and WebKit)? Can you point to a specific part or component that was fully created by the LLM but doesn't clearly resemble anything in existing browser engines?

You're claiming that the JS VM was implemented. Is it actually running? Because this screenshot shows that the ACID3 benchmark is requesting that you enable JavaScript (https://imgur.com/fqGLjSA). Why don't you upload a video of you loading this page?

Your slop is worthless except to convince gullible investors to give you more money.


Does any of it actually work? Can you build that JS VM separately and run serious JS on it? That would be an accomplishment.

Looking at the comments and claims (I've not got the time to review a large code base just to check this claim) I get an impression _something_ was created, but none of it actually builds and no one knows what is the actual plan.

Did your process not involve recursive planning stages (these ALWAYS have big architectural error and gotchas in my experience, unless you're doing a small toy project or something the AI has seen thousands of already).

I find agents doing pretty well once you have a human correct their bad assumptions and architectural errors. But this assumes the human has absolute understanding of what is being done down to the tiniest component. There will be errors agents left to their own will discover at the very end after spending dozens of millions of tokens, then they will try the next idea they hallucinated, spend another few dozen million tokens and so on. Perhaps after 10 iterations like this they may arrive at something fine or more likely they will descent into hallucinations hell.

This is what happens when one of :the complexity, the size, or it being novel enough (often a mix of all 3) of the task exceed the capability of the agents.

The true way to success is the way of a human-ai hybrid, but you absolutely need a human that knows their stuff.

Let me give you a small example from systems field. The other day I wanted to design an AI observability system with the following spec: - use existing OS components, none or as little code as possible - ideally runs on stateless pods on an air gapped k3s cluster (preferably uses one of existing DBs, but clickhouse acceptable) - able to proxy openai, anthropic(both api and clause max), google(vercel+gemini), deepinfra, openrouter including client auth (so it is completely transparent to the client) - reconstruct streaming responses, recognises tool calls, reasoning content, nice to have ability to define own session/conversation recognition rules

I used gemini 3 and opus 4.5 for the initial planning/comparison of os projects that could be useful. Both converged on helicone as being supposedly the best. Until towards the very end of implementation it was found helicone has pretty much zero docs for properly setting up self hosted platform, it tries redirecting to their Web page for auth and agents immediately went into rewriting parts of the source attempting to write their own auth/fixing imaginary bugs that were really miscondiguration.

Then another product was recommended (I forgot which), there upon very detailed questioning, requesting re-confirmations of actual configs for multiple features that were supposedly supported it turned out it didn't pass through auth for clause max.

Eventually I chose litellm+langfuse (that was turned down initially in favour of helicone) and I needed to make few small code changes so Claude max auth could be read, additional headers could be passed through and within a single endpoint it could send Claude telemetry as pure pass through and real llm api through it's "models" engine (so it recognised tool calls and so on).


This spec reads like what we've been building at toran.sh - transparent proxy for AI API calls with observability.

The core idea: you create a "toran" (read-only inspection endpoint) bound to a single upstream. Point your client at the toran URL instead of the API directly. No SDK changes, no code changes - just swap the base URL. It shows exactly what went over the wire in real time.

For the multi-provider setup you're describing (OpenAI, Anthropic, Google, etc.), you'd create separate torans for each upstream. Auth passthrough works because the toran is transparent - it just forwards headers.

We're still early (focused on the "see what's happening" problem before tackling rate limiting/policy), but if the visibility piece would help with your setup, happy to give you access and hear how it compares to litellm+langfuse for your use case.


Thanks for the feedback. I've addressed similar feedback at [0] and provided some more context at [1].

I do want to briefly note that the JS VM is custom and not QuickJS. It also implemented subsystems like the DOM, CSS cascade, inline/block/table layouts, paint systems, text pipeline, and chrome, and I'd push back against the assertion that it merely calls out to external code. I addressed these points in more detail at [0].

[0] https://news.ycombinator.com/item?id=46650998 [1] https://news.ycombinator.com/item?id=46655608


> I do want to briefly note that the JS VM is custom and not QuickJS

It's hard to verify because your project didn't actually compile. But now that you've fixed the compilation manually, can you demonstrate the javascript actually executing? Some of the people who got the slop compiling claimed credibly that it isn't executing any JavaScript.

You merely have to compile your code, run the binary and open this page - http://acid3.acidtests.org. Feel free to post a video of yourself doing this. Try to avoid the embellishment that has characterised this effort so far.


This is from the "official" build - https://imgur.com/fqGLjSA

The "in progress" build has a slightly different rendering but the same result


Yeah, it's not executing any JavaScript. Hey Mr. Wilson! You've spent millions creating this worthless slop. How about making sure that the code is actually being executed? Or is that not necessary to raise millions more in VC funding?

Hey, Wilson here, author of the blog post and the engineer working on this project. I've been reading the responses here and appreciate the feedback. I've posted some follow up context on Twitter/X[0], which I'll also write here:

The repo is a live incubator for the harness. We are actively researching the behavior of collaborative long running agents, and may in the future make the browser and other products this research produces more consumable by end users and developers, but it's not the goal for now. We made it public as we were excited by the early results and wanted to share; while far off from feature parity with the most popular production browsers today, we think it has made impressive progress in the last <1 week of wall time.

Given the interest in trying out the current state of the project, I've merged a more up-to-date snapshot of the system's progress that resolves issues with builds and CI. The experimental harness can occasionally leave the repo in an incomplete state but does converge, which was the case at the time of the post.

I'm here to answer any further questions you have.

[0] https://x.com/wilsonzlin/status/2012398625394221537?s=20


That doesn’t really address much of the criticism in this thread. No one is shocked that it’s not as good as production web browsers. It’s that it was billed as “from scratch” but upon deeper inspection it looks like it’s just gluing together Servo and some other dependencies, so it’s not really as impressive or interesting because the “agents” didn’t really create a browser engine.

Upon deeper inspection? Someone checked the Cargo file and proclaimed it was just Servo and QuickJS glued together without actually bothering to look if these dependencies are even being used.

In reality while project does indeed have Servo in its dependencies it only uses it for HTML tokenization, CSS selector matching and some low level structures. Javascript parsing and execution, DOM implementation & Layout engine was written from scratch with only one exception - Flexbox and Grid layouts are implemented using Taffy - a Rust layout library.

So while “from scratch” is debatable it is still immensely impressive to be that AI was able to produce something that even just “kinda works” at this scale.


> So while “from scratch” is debatable it is still immensely impressive to be that AI was able to produce something that even just “kinda works” at this scale.

“From scratch” is inarguably wrong given how much third-party code it depends on. There’s a reasonable debate about how much original content there is but if I was a principal at a company whose valuation hinges on the ability to actually deliver “from scratch” for real, I would be worried about an investor suing for material misrepresentation of the product if they bought now and the value went down in the future.


Thanks for the feedback. I agree that for some parts that use dependencies, the agent could have implemented them itself. I've begun the process of removing many of these and developing them within the project alongside the browser. A reasonable goal for "from scratch" may be "if other major browsers use a dependency, it's fine to do so too". For example: OpenSSL, libpng, HarfBuzz, Skia.

I'd push back on the idea that all the agents did was glue dependencies together — the JS VM, DOM, CSS cascade, inline/block/table layouts, paint systems, text pipeline, chrome, and more are all being developed by agents as part of this project. There are real complex systems being engineered towards the goal of a browser engine, even if not fully there yet.


Hi, there. Two questions about this repo [0].

Can you show us what you did after people failed to compile that project [1]?

There are also questions about the attribution of these commits [2]. Can you share some information?

[0] https://github.com/wilsonzlin/fastrender [1] https://github.com/wilsonzlin/fastrender/issues/98 [2] https://gist.github.com/embedding-shapes/d09225180ea3236f180...


Make it port Firefox's engine to iOS, that's something people would actually use (in countries where Apple is forced to allow other browser engines).

Thanks for the feedback. There were some build errors which have now been resolved; the CI test that was failing was not a standard check CI, and it's now been updated. Let me know if you have any further issues.

> On twitter their CEO explicitly stated that it uses a "custom js vm" which seemed particularly misleading / untrue to me.

The JS engine used a custom JS VM being developed in vendor/ecma-rs as part of the browser, which is a copy of my personal JS parser project vendored to make it easier to commit to.

I agree that for some core engine components, it should not be simply pulling in dependencies. I've begun the process of removing many of these and co-developing them within the repo alongside the browser. A reasonable goal for "from scratch" may be "if other major browsers use a dependency, it's fine to do so too". For example: OpenSSL, libpng, HarfBuzz, Skia. The current project can be moved more towards this direction, although I think using libraries for general infra that most software use (e.g. windowing) can be compatible with that goal.

I'd push back on the idea that all the agents did was wire up dependencies — the JS VM, DOM, paint systems, chrome, text pipeline, are all being developed as part of this project, and there are real complex systems being engineered towards the goal of a browser engine, even if not there yet.


> there are real complex systems being engineered towards the goal of a browser engine, even if not there yet.

In various comments in https://news.ycombinator.com/item?id=46624541 I have explained at length why your fleet of autonomous agents failed miserably at building something that could be seen as a valid POC.

One example: your rendering loop does not follow the web specs and makes no sense.

https://github.com/wilsonzlin/fastrender/blob/19bf1036105d4e...

The above design document is simply nonsense; typical AI hallucinated BS. Detailed critique at https://news.ycombinator.com/item?id=46705625

The actual code is worse; I can only describe it as a tangle of spaghetti. As a Browser expert I can't make much, if anything, out of it. In comparison, when I look at code in Ladybird, a project I am not involved in, I can instantly find my way around the code because I know the web specs.

So I agree this isn't just wiring up of dependencies, and neither is it copied from existing implementations: it's a uniquely bad design that could never support anything resembling a real-world web engine.

Now don't get me wrong, I do think AI could be leveraged to build a web engine, but not by unleashing autonomous agents. You need humans in the loop at all levels of abstractions; the agents should only be used to bang out features re-using patterns established or vetted by human experts.

If you want to do this the right way, get in touch: https://github.com/gterzian


When you say "have now been resolved" - did the AI agent resolve it autonomously, did you direct it to, or did a human do it?

Looks like Cursor Agent was at least somewhat involved: https://github.com/wilsonzlin/fastrender/commit/4cc2cb3cf0bd...

Looks like a bunch of different users (including Google's Jules made one commit) been contributing to the codebase, and the recent "fixes" includes switching between various git users. https://gist.github.com/embedding-shapes/d09225180ea3236f180...

This to me seems to raise more questions than it answers.


The ones at *.ec2.internal generally mean that the git config was never set up ans it defaults to $(id -un)@$(hostname)

Indeed. Extra observant people will notice that the "Ubuntu" username was used only twice though, compared to "root" that was used +3700 times. And observant people who've dealt with infrastructure before, might recognize that username as the default for interactive EC2 instances :)

I'm not sure what were the exact limits, but I definitely recall running into server errors with S3 and the OCI equivalent service — not technically 429s but enough to essentially limit throughput. SQS had 429s, I believe due to number of requests and not messages, but they only support batching at most 10.

I definitely wanted these to "just work" out of the box (and maybe I could've worked more with AWS/OCI given more time), as I wanted to focus on the actual search.


Those are reasonable expectations. I’m very impressed with how it all worked out.



Repopack with Claude projects has been a game changer for me on repository-wide refactors.


Seems like repopack only packs the repo. How do you apply the refactors back to the project? Is it something that Claude projects does automatically somehow?


for me too


Thanks for the heads up, just fixed this.


Thank you! And thanks for raising that issue. I've pushed a fix that should hopefully mitigate this for you: it's possible to unselect, card images are hidden on mobile, and the invisible results area around a card (caused by the tallest card stretching the results area) should no longer intercept map touches. Let me know if it helps!


Thanks for the great pointers! I didn't get the time to look into hierarchical clustering unfortunately but it's on my TODO list. Your comment about making the map clearer is great and something I think there's a lot of low-hanging approaches for improving. Another thing for the TODO list :)


Thanks! Yeah I'd like to dive deeper into the sentiment aspect. As you say it'd be interesting to see some overview, instead of specific queries.

The negative sentiment stood out to me mostly because I was expecting a more "clear-cut" sentiment graph: largely neutral-positive, with spikes in the positive direction around positive posts and negative around negative posts. However, for almost all my queries, the sentiment was almost always negative. Even positive posts apparently attracted a lot of negativity (according to the model and my approach, both of which could be wrong). It's something I'd like to dive deeper into, perhaps in a future blog post.


The sentiment issue is a curious one to me. For example, a lot of humans I interact with that are not devs take my direct questioning or critical responses to be "negative" when there is no negative intent at all. Pointing out something doesn't work or anything that the dev community encounters on a daily basis isn't an immediate negative sentiment but just pointing out the issues. Is it a meme-like helicopter parent constantly doling out praise positive so that anything differing shows negativity? Not every piece of art needs to be hung on the fridge door, and providing constructive criticism for improvement is oh so often framed as negative. That does the world no favors.

Essentially, I'm not familiar with HuggingFace or any models in this regard. But if they are trained from the socials, then it seems skewed from the start to me.

Also, fully aware that this comment will probably be viewed as negative based on stated assumptions.

edit: reading further down the comments, clearly I'm not the first with these sentiments.


Speaking from experience, debate is easily misread as negative arguing by outsiders, even though all involved parties are enjoying challenging each other's ideas.


You may be right, a more tailored classifier for HN comments specifically may be more accurate. It'd be interesting to consider the classes: would it still be simply positive/negative? Perhaps constructive/unconstructive? Usefulness? Something more along the lines of HN guidelines?


Just one point of note : people are FAR more likely to respond and take to writing to something negative than positive. I don’t know the exact numbers but it just engages people more. People just don’t pick up the pen to write how good something is as much.


Every helicopter gets a trophy


wait, the parents get a trophy?


I did something related for my ChillTranslator project for translating spicy HN comments to calm variations which has a GGUF model that runs easily and quickly but it's early days. I did it with a much smaller set of data, using LLM's to make calm variations and an algo to pick the closest least spicy one to make the synthetic training data then used Phi 2. I used Detoxify then OpenAI's sentiment analysis is free, I use that to verify Detoxify has correctly identified spicy comments then generate a calm pair. I do worry that HN could implode / degrade if there is not able to be a good balance for the comments and posts that people come here for. Maybe I can use your sentiment data to mine faster and generate more pairs. I've only done an initial end-to-end test so far (which works!). The model, so far is not as high quality as I'd like but I've not used Phi 3 on it yet and I've only used a very small fine-tune dataset so far. File is here though: https://huggingface.co/lukestanley/ChillTranslator I've had no feedback from anyone on it though I did have a 404 in my Show HN post!


Anecdotally, I think anyone who reads HN for a while will realize it to be a negative, cynical place.

Posts written in sweet syrupy tones wouldn’t do well here, and jokes are in short supply or outright banned. Most people here also seem to be men. There’s always someone shooting you down. And after a while, you start to shoot back.


(Without wanting to sound negative or cynical) I don’t think it is, but maybe I haven’t been here long enough to notice. It skews towards technical and science and technology-minded people, which makes it automatically a bit ‘cynical’, but I feel like 95% of commenters are doing so at least in good faith. The same cannot be said of many comparable discussion forums or social media websites.

Jokes are also not banned; I see plenty on here. Low-effort ones and chains of unfunny wordplay or banter seem to be frowned upon though. And that makes it cleaner.


I've been here a hot minute and I agree with you. Lots of good faith. Lots of personal anecdotes presumably anchored in experience. Some jokes are really funny, just not reddit-style. Similarly, no slashdot quips generally, such as "first post" or "i, for one, welcome our new HN sentiment mapping robot overlords." Sometimes things get downvoted that shouldn't, but most of the flags I see are well deserved, and I vouch for ones that I think are not flag-worthy


I wonder how much of a persons impression of this is formed by their browsing habits.

As a parent comment mentions big threads can be a bit of a mess but usually only for the first couple of hours. Comments made in the spirit of HN tend to bubble up and off-topic, rude comments and bad jokes tend to percolate down over the course of hours. Also a number of threads that tend to spiral get manually detached which takes time to go clean up.

Someone who isn't somewhat familiar with how HN works that is consistently early to stories that attract a lot of comments is reading an almost entirely different site than someone who just catches up at the end of the day.


some of the more negative threads will get flagged and detached and by the end of the day a casual browse through the comments isn't even going to come across them. eg something about the situation in the middle east is going to attract a lot of attention.


I think it's the engineering mindset. You're always trying to figure out what's wrong with an idea, because you might be the poor bastard that ends up having to build it. Less costly all round if you can identify the flaw now, not halfway through sprint 7. After a while it bleeds into everything you do.


> Anecdotally, I think anyone who reads HN for a while will realize it to be a negative, cynical place.

Sure, sometimes. But usually it's

Truth seeking > group thinking

There's a fine line between critical and cynical. Sometimes that line gets crossed. Sometimes the ambiguity of text-only comms clouds the water.


> Anecdotally, I think anyone who reads HN for a while will realize it to be a negative, cynical place.

I don't think this is particularly unique to HN. Anonymous forums tend to attract contrarian assholes. Perhaps this place is more, erm, poorly socially-adapted to the general population, but I don't see it as very far outside the norm outside of the average wealth of the posters.


Really? Mmm i think hn is a place with on avarage above intelligent people. People who understand that their opinion is not the only one. I rarely have issues with people here. Might be also because we are all in the same bubble here.


its so interesting that in Likert scale surveys, I tend to see huge positivity bias/agreement bias, but comments tend to be critical/negative. I think there is something related to the format of feedback that skews the graph in general.

On HN, my theory is that positivity is the upvotes, and negativity/criticality is the discussion.

Personally, my contribution to your effort is that I would love to see a tool that could do this analysis for me over a dataset/corpus of my choosing. The code is nice, but it is a bit beyond me to follow in your footsteps.


Great work! Would you consider adding support for search-via-url, e.g. https://hn.wilsonl.in/?q=sentiment+analysis. It would enable sharing and bookmarks of stable queries.


Thanks for the suggestion, I've just added the feature:

https://hn.wilsonl.in/s/sentiment%20analysis


It will be a deep dive into the most essential of HN staples, the nitpick


[flagged]


Lol what a typical comment for today's HN. Condescending ("just plain wrong") with a jab ("this isn't a hugbox") placed in just to remind you that not only are you perceived to be wrong but you've provoked anger. No proof to provoke the jab, no feedback to help fix what you perceive as wrong sentiment analysis. Just thoughtless condescension and anger. Why is the sentiment wrong? Is this a data analysis trap the OP fell into? Nah let's insult the OP instead.

In my experience having run a bunch of different sentiment models on HN comments, HN comments tend to place around neutral to slightly negative as a whole, even when I perceive the thread to be okay. However I've noticed a huge bump in negative sentiment on large HN threads. I generally find that absolute sentiment doesn't work in most corpuses because the model reflects its training set's sentiment labels. I generally find relative sentiment to be a lot more useful. I have yet to do a temporal sentiment analysis on HN but I have a suspicion that it's gotten more negative over time. I agree with another poster that I think HN needs to be careful to not become so negative that it just becomes an anger echo.

Relative sentiment on this site between topics is something I've done and the obvious results show. Crypto threads are by-and-large negative, most political and news related threads are also highly negative.


Cynicism is perceived as more intelligent [0]. I personally find the HN brand of discussion to be difficult to bs my way into. But no matter your level of competency you can always find something to criticize and feel you've contributed. I wonder if academia or even "more intelligent" discussion in general would be counted as more negative.

https://journals.sagepub.com/doi/pdf/10.1177/014616721878319...


As someone who is not an academic myself, but likes to listen to podcasts where academics discuss issues with each other, I often find that the conversations feel contentious, and sometimes they are, but the vast majority of the time the academics themselves feel like they're having a perfectly cordial and productive conversation. So I do think there is something to the idea that academic discussion comes across as being negative.


HN definitely has a negative valence.

Sure, there's the 20% of comments that are outright rude, or tie everything back to their pet grievance (job satisfaction, government surveillance, the existence of JS).

But beyond that, the technical conversation has a negative, critical edge. A lot of comments come from the angle "You did something wrong by...", or only reply to correct.

There are still golden comments, and most personal anecdotes are treated respectfully, but it makes for an intimidating environment.


Whoosh, I was making a point by styling my comment in a way that would be perceived as negative by sentiment analysis.

Good job doing a whole psychoanalysis based on what's basically a joke, though.


Heh did I miss the joke? That was a whoops indeed! Sentiment is hard on the internet ;)

> Good job doing a whole psychoanalysis based on what's basically a joke, though.

Guess there's still some work to be done on that positive sentiment replying eh? :)


That one was intentional ;)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: