You're choosing to focus on specific hype posts (which were actually just misunderstandings of the original confusingly-worded Twitter post).
While ignoring the many, many cases of well-known and talented developers who give more context and say that agentic coding does give them a significant speedup (like Antirez (creator of Reddit), DHH (creator of RoR), Linus (Creator of Linux), Steve Yegge, Simon Wilison).
Why not in that case provide an example to rebut and contribute as opposed to knocking someone elses example even if it was against the use of agentic coding.
Serious question - what kind of example would help at this point?
Here are a sample of (IMO) extremely talented and well known developers who have expressed that agentic coding helps them: Antirez (creator of Reddit), DHH (creator of RoR), Linus (Creator of Linux), Steve Yegge, Simon Wilison. This is just randomly off the top of my head, you can find many more. None of them claim that agentic coding does a years' worth of work for them in an hour, of course.
In addition, pretty much every developer I know has used some form of GenAI or agentic coding over the last year, and they all say it gives them some form of speed up, most of them significant. The "AI doesn't help me" crowd is, as far as I can tell, an online-only phenomenon. In real life, everyone has used it to at least some degree and finds it very valuable.
Those are some high profile (celebrity) developers.
I wonder if they have measured their results? I believe that the perceived speed up of AI coding is often different from reality. The following paper backs this idea https://arxiv.org/abs/2507.09089 . Can you provide data that objects this view, based on these (celebrity) developers or otherwise?
Almost off-topic, but got me curious: How can I measure this myself? Say I want to put concrete numbers to this, and actually measure, how should I approach it?
My naive approach would be to just implement it twice, once together with an LLM and once without, but that has obvious flaws, most obvious that the order which you do it with impacts the results too much.
So how would I actually go about and be able to provide data for this?
> My naive approach would be to just implement it twice, once together with an LLM and once without, but that has obvious flaws, most obvious that the order which you do it with impacts the results too much.
You'd get a set of 10-15 projects, and a set of 10-15 developers. Then each developer would implement the solution with LLM assistance and without such assistance. You'd ensure that half the developers did LLM first, and the others traditional first.
You'd only be able to detect large statistical effects, but that would be a good start.
If it's just you then generate a list of potential projects and then flip a coin as to whether or not to use the LLM and record how long it takes along with a bunch of other metrics that make sense to you.
Which seems to indicate that there would be a suitable way for a single individual to be able to measure this by themselves, which is why I asked.
What you're talking about is a study and beyond the scope of a single person, and also doesn't give me the information I'd need about myself.
> If it's just you then generate a list of potential projects and then flip a coin as to whether or not to use the LLM and record how long it takes along with a bunch of other metrics that make sense to you.
That sounds like I can just go by "yeah, feels like I'm faster", which I thought exactly was parent wanted to avoid...
> That sounds like I can just go by "yeah, feels like I'm faster", which I thought exactly was parent wanted to avoid...
No it doesn't, but perhaps I assumed too much context. Like, you probably want to look up the Quantified Self movement, as they do lots of social science like research on themselves.
> Which seems to indicate that there would be a suitable way for a single individual to be able to measure this by themselves, which is why I asked.
I honestly think pick a metric you care about and then flip a coin to use an LLM or not is the best you're gonna get within the constraints.
> Like, you probably want to look up the Quantified Self movement, as they do lots of social science like research on themselves.
I guess I was looking for something bit more concrete, that one could apply themselves, which would answer the "if they have measured their results? [...] Can you provide data that objects this view" part of parents comment.
> then flip a coin to use an LLM or not is the best you're gonna get within the constraints.
Do you think trashb who made the initial question above would take the results of such evaluation and say "Yeah, that's good enough and answers my question"?
> I guess I was looking for something bit more concrete, that one could apply themselves, which would answer the "if they have measured their results? [...] Can you provide data that objects this view" part of parents comment.
This stuff is really, really hard. Social science is very difficult as there's a lot of variance in human ability/responses. Added to that is the variance surrounding setup and tool usage (claude code vs aider vs gemini vs codex etc).
Like, there's a good reason why social scientists try to use larger samples from a population, and get very nerdy with stratification et al. This stuff is difficult otherwise.
The gold standard (rather like the METR study) is multiple people with random assignment to tasks with a large enough sample of people/tasks that lots of the random variance gets averaged out.
On a 1 person sample level, it's almost impossible to get results as good as this. You can eliminate the person level variance (because it's just one person), but I think you'd need maybe 100 trials/tasks to get a good estimate.
Personally, that sounds really implausible, and even if you did accomplish this, I'd be sceptical of the results as one would expect a learning effect (getting better at both using LLM tools and side projects in general).
The simple answer here (to your original question) is no, you probably can't measure this yourself as you won't have enough data or enough controls around the collection of this data to make accurate estimates.
To get anywhere near a good estimate you'd need multiple developers and multiple tasks (and a set of people to rate the tasks such that the average difficulty remains constant.
Actually, I take that back. If you work somewhere with lots and lots of non-leetcode interview questions (take homes etc) you could probably do the study I suggested internally. If you were really interested in how this works for professional development, then you could randomise at the level of interviewee and track those that made it through and compare to output/reviews approx 1 year later.
But no, there's no quick and easy way to do this because the variance is way too high.
> Do you think trashb who made the initial question above would take the results of such evaluation and say "Yeah, that's good enough and answers my question"?
I actually think trashb would have been OK with my original study, but obviously that's just my opinion.
To wrap this up, what I was trying to say is that the feeling of being faster may not align with the reality. Even for people that have a good understanding of the matter it may be difficult to estimate. So I would say be skeptical of claims like this and try to somehow quantize it in a way that matters for the tasks you do. This is something managers of software projects have been trying to tackling for a while now.
There is no exact measurement in this case but you could get an idea by testing certain types of implementations. For example if you are finishing similar tasks on average 25% faster during a longer testing period with and without AI. Just the act of timing yourself doing tasks with or without AI may already give a crude indication of the difference.
You could also run a trail implementing coding tasks like leet code however you will introduce some kind of bias due to having done it previously. And additionally the tasks may not align with your daily activities.
A trail with multiple developers working on the same task pool with or without AI could lead to more substantial results but you won't be able to do that by yourself.
So there seems to be an shared underestanding how difficult "measure your results" would be in this case, so could we also agree that asking someone:
> I wonder if they have measured their results? [...] Can you provide data that objects this view, based on these (celebrity) developers or otherwise?
isn't really fair? Because not even you or I really know how to do so in a fair and reasonable manner, unless we start to involve trials with multiple developers and so on.
> isn't really fair? Because not even you or I really know how to do so in a fair and reasonable manner, unless we start to involve trials with multiple developers and so on.
I think in a small conversation like this, it's probably not entirely fair.
However, we're hearing similar things from much larger organisations who definitely have the resources to do studies like this, and yet there's very little decent work available.
In fact, lots of the time they are deliberately misleading people (25% of our code generated by AI being copilot/other autocomplete). Like, that 25% stat was probably true historically with JetBrains products and using any form of code generations (for protobufs et al) so it's wildly deceptive et al.
This is a notoriously difficult thing to measure in a study. More relevantly though, IMO, it's not a small effect that might be difficult to notice - it's a huge, huge speedup.
How many developers have measured whether they are faster when programming in Python vs assembly? I doubt many have. And I doubt many have chosen Python over assembly because of any study that backs it up. But it's also not exactly a subtle difference - I'm fairly 99% of people will say that, in practice, it's obvious that Python is faster for programming than assembly.
I talked literally yesterday to a colleague who's a great senior dev, and he made a demo in an hour and a half that he says would've taken him two weeks to do without AI. This isn't a subtle, hard to measure difference. Of course this is in an area where AI coding shines (a new codebase for demo purposes) - but can we at least agree that in some things AI is clearly an order of magnitude speedup?
A lot of comments reads like a knee jerk reaction to the Twitter crowd claiming they vibe code apps making 1m$ in 2 weeks.
As a designer I'm having a lot of success vibe coding small use cases, like an alternative to lovable to prototype in my design system and share prototypes easily.
All the devs I work with use cursor, one of them (front) told me most of the code is written by AI. In the real world agentic coding is used massively
I think it is a mix of ego and fear - basically "I'm too smart to be replaced by a machine" and "what I'm gonna do if I'm replaced?".
The second part is something I think a lot about now after playing around with Claude Code, OpenCode, Antigravity and extrapolating where this is all going.
I agree it's about the ego .. about the other part I am also trying to project few scenarios in my head.
Wild guess nr.1: large majority of software jobs will be complemented (mostly replaced) with the AI agents, reducing the need for as many people doing the same job.
Wild guess nr.2: demand for creating software will increase but the demand for software engineers creating that software will not follow the same multiplier.
Wild guess nr.3: we will have the smallest teams ever with only few people on board leading perhaps to instantiating the largest amount of companies than ever.
Wild guess nr.4: in near future, the pool of software engineers as we know them today, will be drastically downsized, and only the ones who can demonstrate they can bring the substantial value over using the AI models will remain relevant.
Wild guess nr.5: getting the job in software engineering will be harder than ever.
That is hilarious.... and to prove the point of this whole comment thread, I created reddit-kv for us. It seems to work against a mock, I did not test it against Reddit itself as I think it violates ToS. My prompts are in the repo.
You haven't provided a sample either... But sure, lets dig in.
> Antirez
When I first read his recent article, I found the whole article, uncompelling. https://antirez.com/news/158 (don't buy into the anti-AI hype) But gave it a 2nd chance; and re-read it. I'm gonna have to resist going line by line, because I find some of it outright objectionable.
> Whatever you believe about what the Right Thing should be, you can't control it by refusing what is happening right now. Skipping AI is not going to help you or your career.
Setting aside the rhetorical/argumentative deficiencies, and the fact this is just FUD because (he next suggests if you disagree, just keep trying it every few months? which suggests to me even he knows it's BS). He writes that in the context of the ethical or moral objections he raises. So he's suggesting that the best way to advance in your career, is to ignoring the social and ethical concerns and just get on board?
Gross.
Individual careers aside, I'm not impressed by the correctness of the code emitted, by AI and committed by most AI users. I'm unconvinced that AI will improve the industry, and it's reputation as a whole.
But the topic is supposed to be specific examples of code, so lets do that. He mentions adding utf-8 to his toy terminal input project -> https://github.com/antirez/linenoise/commit/c12b66d25508bd70... It's a very useful feature to add, without a doubt! His library is better than it was before. But parsing utf-8, while something that's very easy to implement without care, or incompletely, i.e. something that's very easy to trip over if you're careless. The implementation specifics of it are fairly described as a solved problem. It's been done so many times, if you're willing to re-implement from another existing source, It wouldn't take very long to do this without AI. (And if you're not, why are you using AI? I'm ethically opposed to the laundered provenience of source material) Then, it absolutely would take more time to verify that the code is correct if you did it by hand. The thing everyone keeps telling me I have to ensure that the AI hasn't made a mistake, so either I trust the vibes, or I'm still spending that time. Even Simon Willison agrees with me[1].
> Simon Willison
Is another suggested, so he's perfect to go next. I normally would exclude someone who's clearly best know as an AI influencer, but he's without a doubt an engineer too to fair game. Especially given he's answered a similar question just recently https://news.ycombinator.com/item?id=46582192 I've been searching for a counter point to my personal anti-AI hype, so was eager to see what the experts are making.... it's all boilerplate. I don't mean to say there's nothing valuable or that there's nothing useful there. Only that the vast majority of the code in these repos, is boilerplate that has no use out of context. The real value is just a few lines of code, something that I believe would only take 30m if you wrote the code without AI for the project you were already working on. It'd take a few hours to make any of this myself (assuming I'm even good enough to figure it out).
And I do admit, 10m on BART vs 3-4hours on a weekend is a very significant time delta. But also, I like writing code. So what was I really gonna do with that time? Make share holder value go up no doubt!
> Linus Torvalds
I can't find a single source where he's an advocate for AI. I've seen the commit, and while some of the github comments are gold. I wasn't able to draw any meaningful conclusions from the commit in isolation. Especially not when the last I read about it, he used it because he doesn't write python code. So I don't know what conclusions there are I can pull from this commit, other than AI can emit code. I knew that.
I don't have enough context to comment on the opinions of Steve Yegge or his AI generated output. I simply don't know enough, and after a quick search nothing other than AI influencer jumped out at me.
Then I try to care about who I give my time and attention to, or who I associate with so this is the end of list.
I contrast these, examples with all the hype that's proven over and over to be a miscommunication if I'm being charitable, or an outright lie if I'm not. I also think it's important to consider the incentives leading to these "miscommunications" when evaluating how much good faith you assign them.
On top of that, there's the countless examples of AI confidently lying to me about something. Explaining my fundamental concrete objection to being lied to; would take another hour I shouldn't spend on a HN comment.
What specific examples of impressive things/projects/commits/code am I missing? What output, makes all the downsides of AI a worthwhile trade off?
> In addition, pretty much every developer I know has used some form of GenAI or agentic coding over the last year, and they all say it gives them some form of speed up
I remember reading something that when tested, they're not actually faster. Any source on this other than vibes?
I would answer, but I rather not, publicly, because of HN privacy policy (nonexistent) and handling of personal data (abysmal), and also because is impossible to delete your comments.
> The productivity studies on software engineers directly don't show much of a productivity gain certainly nowhere near the 10x the frontier labs would like to claim.
Which studies are you talking about? The last major study that I saw (that gained a lot of attention) was published half a year ago, and the study itself was conducted on developers using AI tools in 2024.
The technology has improved so rapidly that this study is now close-to-meaningless.
"The technology has improved so rapidly that this study is now close-to-meaningless."
You could have said that anytime in the last 3 years, but the data has never shown it to be true. Is there data to show that the current gen models are so much better than the last gen models that the existing productivity data should be ignored? I don't think the coding benchmarks show a step change in capabilities, its generally dev vibes rather than a large change to measurements.
> There's just no way in hell ChatGPT at its current level is going to guide you flawlessly through all of that if you start with a simple "I want to build a raytracer" prompt!
This is the entire crux of your argument. If it's false, then everything else you wrote is wrong - because all that the consumer of the book cares about is the quality of the output.
I'd be pretty surprised if you couldn't get a tutorial exactly as good as you want, if you're willing to make a prompt that's a bit better than just "I want to build a ray tracer" prompt. I'd be even more surprised if LLMs won't be able to do this in 6 months. And that's not even considering the benefits of using an LLM (something unclear in the tutorial? Ask and it's answered).
Indeed. The top-level comment is pretty much wishful thinking. At this point, if you tell a frontier LLM to explain things bottom up, with motivation and background, you usually get something that’s better than 95% of the openly available material written by human domain experts.
Of course, if you just look at top posts on forums like this one, you might get the impression that humans are still way ahead, but that’s only because you’re looking at the best of the best of the best stuff, made by the very best humans. As far as teaching goes, the vast majority of humans are already obsolete.
> ...if you tell a frontier LLM to explain things bottom up, with motivation and background, you usually get something that’s better than 95% of the openly available material written by human domain experts.
That's an extraordinary claim. Are there examples of this?
What about letting customers actually try the products and figure out for themselves what it does and whether that's useful to them?
I don't understand this mindset that because someone stuck the label "AI" on it, consumers are suddenly unable to think for themselves. AI as a marketing label has been used for dozens of years, yet only now is it taking off like crazy. The word hasn't change - what it's actually capable of doing has.
But, to be fair, that wasn't the kind of critique it was talking about. If your critique guns is moral, strategic, etc, then yes, you can do it without actually trying out guns. If your critique is that guns physically don't work, don't actually do the thing they are claimed to do, then some hands-on testing would quickly dispel that notion.
The article is talking about those kinds of critiques, ones of the "AI doesn't work" variety, not "AI is harmful".
I don't know any engineers, any reports, or any public community voices who claim GenAI is bad because "AI doesn't work because I tried ChatGPT in 2022 and it was dumb." So it's a critique of a fictional movement which doesn't exist vs. an attempt at critiquing an actual movement.
While ignoring the many, many cases of well-known and talented developers who give more context and say that agentic coding does give them a significant speedup (like Antirez (creator of Reddit), DHH (creator of RoR), Linus (Creator of Linux), Steve Yegge, Simon Wilison).
reply