Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As a fellow "staff engineer" LLMs are terrible at writing or teaching how to write idiomatic code, and they are actually causing me to spend more time reviewing than I was previously due to the influx of junior to senior engineers trying to sneak in LLM garbage.

In my opinion, using LLMs to write code comes as a faustian deal where you learn terrible practices and rely on code quantity, boilerplate, and indeterministic outputs - all hallmarks of poor software craftsmanship. Until ML can actually go end to end on requirements to product and they fire all of us, you can't cut corners on building intuition as a human by forgoing reading and writing code yourself.

I do think that there is a place for LLMs in generating ideas or exploring an untrusted knowledge base of information, but using code generated from an LLM is pure madness unless what you are building is truly going to be thrown away and rewritten from scratch, as is relying on it as a linting, debugging, or source of truth tool.



I will get probably heavily crucified for this, but to people who are ideologically opposing AI generated code — executives, directors and managerial staff think the opposite. Being very anti-LLM code instead of trying to understand how it can improve the speed might be detrimental for your career.

Personally, I’m on the fence. But having conversations with others, and some requests from execs to implement different AI utils into our processes… making me to be on the safer side of job security, rather than dismiss it and be adamant against it.


> executives, directors and managerial staff think the opposite

Executives, directors, and managerial staff have had their heads up their own asses since the dawn of civilization. Riding the waves of terrible executive decisions is unfortunately part of professional life. Executives like the idea of LLMs because it means they can lay you off; they're not going to care about your opinion on it one way or another.

> Being very anti-LLM code instead of trying to understand how it can improve the speed might be detrimental for your career.

You're making the assumption that LLMs can improve your speed. That's the very assumption being questioned by GP. Heaps of low-quality code do not improve development speed.


I'm willing to stake my reputation on the idea that yes, LLMs can improve your speed. You have to learn how to use them effectively and responsibly but the productivity boosts they can give you once you figure that out are very real.


I'm with you on this one. If one's experience is just using it as a general chat bot, then yeah, I can see why people are reluctant thinking it's useless. I have a feeling that a good chunk of people haven't tried to use latest models on a medium-small project from scratch, where you have to play around with its intricacies to build intuition around it.

It becomes some sort of muscle memory, where I can predict whether using LLM would be faster or slower. Or where it's more likely to give bad suggestions or not. Basically treating it as googling skills.


"It becomes some sort of muscle memory, where I can predict whether using LLM would be faster or slower"

Yeah, that intuition is so important. You have to use the models a whole bunch to develop it, but eventually you get a sort of sixth sense where you can predict if an LLM is going to be useful or harmful on a problem and be right about it 9/10 times.

My frustration is that intuition isn't something I can teach! I'd love to be able to explain why I can tell that problem X is a good fit and problem Y isn't, but often the answer is pretty much just "vibes" based on past experience.


Totally! My current biggest problems:

1) Being up to date with the latest capabilities - this one has slowed down a bit, and my biggest self-learning experience was in August/September, and most of the intuition still works. However, although I had time to do so, it's hard to ask my team to drop 5-6 free weekends of their lives to get up to speed

2) Transition period where not everyone is on the same page about LLM - this I think is much harder, because the expectations from the executives are much different than on the ground developers using LLMs.

A lot of people could benefit on alignment of expectations, but once again, it's hard to explain what is possible and not possible if your statements will become nullified a month later with a new AI model/product/feature.


You're confusing intuition of what works or not with being too close to your problem to make a judgement on the general applicability of your techniques.


I don't understand what you mean.


They're saying it might work for you, but isn't generally applicable (because most people aren't going to develop this intuition, presumably).

Not sure I agree with that. I would say there are classes of problems where LLMs will generally help and a brief training course (1 week, say) would vastly improve the average (non-LLM-trained) engineer's ability to use it productively.


No it is more like thinking that my prescribed way of doing stuff must be the way things work in general because it works for me. But you give instructions specifically about everything you did but the person you give them to isn't your exact height or can't read your language, so you can easily assume now that they just don't get it. But with these LLMs you also get this bias hidden from you as you inch closer to the solution at every turn. The result seems "obvious" but the outcomes were never guaranteed and will most likely be different for someone else if one thing at point is different.


My whole thing about LLMs is that using them isn't "obvious". I've been banging that drum for over a year now - the single biggest misconception about LLMs is that they are easy to use and you don't need to put a serious amount of effort into learning how to best apply them.


It's to me more that the effort you put in is not a net gain. You don't advance a way of working with them that way because of myriad reasons including things from ownership of the models, fundamentals of the resulting total probabilistic space of the interaction, to just simple randomness even at low temperatures. The "learning how to best apply them" is not definable because who is learning what to apply what to what... The most succint way I know how to describe these issues is that like startup success you're saying "these are lotto number that worked for me" in many of the assumptions you make about the projects you present.

In real, traditional, deterministic systems where you explicitly design a feature, even that has difficulty being coherent over time as usage grows. Think of tab stops on a typewriting evolving from an improvised template, to metal tabs installed above the keyboard, to someone cutting and pasting incorrectly and reflowing a 200 page document to 212 pages accidentally because of tab characters...

If you create a system with these models that writes the code to process a bunch of documents in some way or so some kind of herculean automation you haven't improved the situation when it comes to clarity or simplicity, even if the task at hand finishes sooner for you in this moment.

Every token generated has an equal potential to spiral out into new complexities and whack a mole issues that tie you to assumptions about the system design while providing this veneer that you have control over the intersections of these issues, but as this situation grows you create an ever bigger problem space.

And I definitely hear you say, this is the point where you use sort of full stack interoception holistic intuition about how to persuade the system towards a higher order concept of the system and expand your ideas about how the problem could be solved and let the model guide you ... And that is precisely the mysticism I object to because it isn't actually a kind productiveness, but a struggle, a constant guessing, and any insight from this can be taken away, changed accidentally, censored, or packaged as a front run against your control.

Additionally the nature of not having separate in band and out of band streams of data means that even with agents and reasoning and all of the avenues of exploration and improving performance will still not escape the fundamental question of ... What is the total information contained in the entire probabilistic space. If you try to do out of band control in some way like the latest thing I just read where they have a separate censoring layer, you just either wind up having to use another LLM layer there which still contains all of these issues, or you use some kind of non transformer method like Bayesian filtering or something and you get all of the issues outlined in the seminal spam.txt document...

So, given all of this, I think it is really neat the kinds of feats you demonstrate, but I object that these issues can be boiled down to "putting a serious amount of effort into learning how to best apply them" because I just don't think that's a coherent view of the total problem, and not actually something that is achievable like learning in other subjects like math or something. I know it isn't answerable but for me a guiding question remains why do I have to work at all, is the model too small to know what I want or mean without any effort? The pushback against prompt engineering and the rise of agentic stuff and reasoning all seems to essentially be saying that, but it too has hit diminishing returns.


Many bosses are willing to stake their subordinates’ reputations on it, too.


The problem is, I don't know you well enough for that to be worth much.

My experience has been that it's slightly improved code completion and helped with prototyping.


Ummm .... try reading maybe? simon willison dot net


I've read enough from Simon to know that he doesn't know how to build or maintain reliable real world systems.


Django?


Depends on your skills, and the more you use them the less you learn, the more dependent you become.


> Executives like the idea of LLMs because it means they can lay you off; they're not going to care about your opinion on it one way or another.

Probably, but when the time comes for layoffs the ones that will be the first to go are those that are hiding under a rock, claiming that there is no value to those LLM’s even as they’re being replaced.


I can see it go the other way; as LLM's improve, the need for prompters will decrease.

The need for real coding skills however, won't.


One's "real coding skills" don't get judged that much during a performance examination.


That time will come once enough people have LLM'ed their software to hell and back and nothing works anymore.


Just like the day the price of housing and bitcoin will crash. It’ll be here any day now I’m sure!


Guaranteed!


In my mind, this dilemma is actually very simple:

First, what LLMs/GenAI do is automated code generation, plain and simple. We've had code generation for a very long time; heck, even compiling is automated generation of code.

What is new with LLM code generation is non-deterministic, unlike traditional code generation tools; like a box of chocolates, you never know what you're going to get.

So, as long as you have tools and mechanisms to make that non-determinism irrelevant, using LLMs to write code is not a problem at all. In fact, guess what? Hand-coding is also non-deterministic, so we already have plenty of those in place: automated tests, code reviews etc.


I think I’m having the same experience as you. I’ve heard multiple times from execs in my company that “software” will have less value and that, in a few years, there won’t be as many developer jobs.

Don’t get me wrong—I’ve seen productivity gains both in LLMs explaining code/ideation and in actual implementation, and I use them regularly in my workflow now. I quite like it. But these people are itching to eliminate the cost of maintaining a dev team, and it shows in the level of wishful thinking they display. They write a snake game one day using ChatGPT, and the next, they’re telling you that you might be too slow—despite a string of record-breaking quarters driven by successful product iterations.

I really don’t want to be a naysayer here, but it’s pretty demoralizing when these are the same people who decide your compensation and overall employment status.


>> But these people are itching to eliminate the cost of maintaining a dev team, and it shows in the level of wishful thinking they display.

And this is the promise of AI, to eliminate jobs. If CEOs invest heavily in this, they won't back down because no one wants to be wrong.

I understand some people try to claim AI might make net more jobs (someday), but I just don't think that is what CEOs are going for.


> If CEOs invest heavily in this, they won't back down because no one wants to be wrong.

They might not have to. If the results are bad enough then their companies might straight-up fail. I'd be willing to bet that at least one company has already failed due to betting too heavily on LLMs.

That isn't to say that LLMs have no uses. But just that CEOs willing something to work isn't sufficient to make it work.


Yes, but don't forget that higher-ups also control, to a large extent, the narrative. Whether laying off developers to replace them with LLMs was good for the company is largely uncorrelated to whether the person in charge of the operation will get promoted for having successfully saved all this money for the company.

Pre-LLM, that's how Boeing destroyed itself. By creating value for the shareholders.


It makes sense—we all know how capitalism works. But the thing is, how can you not apply the law of diminishing returns here? The models are getting marginally better with a significant increase in investment, except for DeepSeek’s latest developments, which are impressive but mostly for cost reasons—not because we’ve achieved anything remotely close to AGI.

If your experienced employees, who are giving an honest try to all these tools, are telling you it’s not a silver bullet, maybe you should hold your horses a little and try to take advantage of reality—which is actually better—rather than forcing some pipe dream down your bottom line’s throat while negating any productivity gains by demotivating them with your bullshit or misdirecting their efforts into finding a problem for a given solution.


I think with LLMs we will actually see the demand for software developers who can understand code and know how to use the tools sky rocket. There will be ultimately be way more money in total going towards software developers, but average pay will be well above the median pay.


> I’ve heard multiple times from execs in my company that “software” will have less value and that, in a few years, there won’t be as many developer jobs.

If LLMs make average devs 10x more productive, Jevon's Paradox[1] suggests we'll just make 10x more software rather than have 10x fewer devs. You can now implement that feature only one customer cares about, or test 10x more prototypes before building your product. And if you instead decide to decimate your engineering team, watch out because your competitors might not.

https://en.wikipedia.org/wiki/Jevons_paradox


Just another way for the people on top to siphon money from everyone else. No individual contributor is going to be rewarded for any productivity increase beyond what is absolutely required to get them to fulfill the company’s goals, and the goal posts will be moving so fast that keeping up will be a full time job. As we see from the current job market, the supply of problems the commercial software market needs more coders to solve maybe isn’t quite as bountiful as we thought it was, and maybe we won’t need to perpetually ramp up the number of developers humanity has… maybe we even have too many already? If a company’s top developer can do the work of the 10 developers below them, their boss is going to happily fire the extra developers— not think of all the incredible other things they could use those developers for. A lot of developers assume that one uber-productive developer left standing will be more valuable to the company than they were before— but now that developer is competing with 10 people that also know the code base willing to work for a lot cheaper. We get paid based on what the market will bear, not the amount of value we deliver, so 100% of that newfound profit goes to the top and the rest of it goes to reducing the price of the product to stay competitive with every other company doing the exact same thing.

Maybe I’m being overly cynical, but assuming this isn’t a race to the bottom and people will get rich being super productive ai-enhanced code monsters, to me, looks like a conceited white collar version of the hustle porn guys that think if they simultaneously work the right combo of gig apps at the right time of day in the right spots then they can work their way up to being wealthy entrepreneurs. Good luck.


Writing code isn’t the bottleneck though. What LLMs do is knock out the floor because you used to need a pretty significant baseline knowledge of a programming language to do anything. Now you don’t need that because you can just spray and pray prompts with no programming knowledge. This actually works to a point since most business code is actually repetitive CRUD. The problem comes with the fact that implicit expectations that the higher level system run with a certain uptime, level of quality, and conform to any number of common sense assumptions that no one but a good programmer was thinking about until someone uses the system and says “why does it do this completely wrong thing”. There are huge classes of these types of problems that an LLM will not be capable of resolving, and if you’ve been blasting ahead with mountains of LLM slop code even the best human programmers might not be able to save you. In other words I think LLMs will make it easy to paint yourself into a corner if you gut the technical core of your team.


But there’s no hard ceiling above the people on the bottom. It’s not a stratification — it’s a spectrum. The lower-end developers replaced easily by LLMs aren’t going to just give up and become task rabbits: they’re going to update their skills trying to qualify for the (probably temporarily) less vulnerable jobs above them. They might never be good enough to solve the really hard problems, they’ll put pressure on those just above them… which will echo up the entire industry. When everyone— regardless of the applicability of LLMs to their workflow— is suddenly facing competition from the developers just below them because of this upward pressure, the market gets a whole lot shittier. Damn near everybody I’ve spoken to thinks they’re the special one that surely can’t be heavily affected by LLMs because their job is uniquely difficult/quality-focused/etc. Even for the smallish percentage of people for whom that’s true, the value of their skill set on a whole is still going to take a huge hit.

What seems far more likely to me is that computer scientists will be doing math research and wrangling LLMS, a vanishingly small number of dedicated software engineers work on most practical textual coding tasks with engineering methodologies, and low or no code tooling with the aid of LLMs gets good enough to make custom software something made by mostly less-technical people with domain knowledge, like spreadsheet scripting.

A lot of people in the LLM booster crowd think LLMs will replace specialists with generalists. I think that’s utterly ridiculous. LLMs easily have the shallow/broad knowledge generalists require, but struggle with the accuracy and trustworthiness for specialized work. They are much more likely to replace the generalists currently supporting people with domain-specific expertise too deep to trust to LLMs. The problem here is that most developers aren’t really specialists. They work across the spectrum of disciplines and domains but know how to use a very complex toolkit. The more accessible those tools are to other people, the more the skill dissolves into the expected professional skill set.


Yeah it seems pretty obvious where this is all going and yet a sizable proportion of the programming population cheers on every recent advancement that makes their skills more and more of a commodity.


"Yes, of course, I'm using AI at every single opportunity where I think it'll improve my output"

<<never uses AI>>


This simply doesn't work much of the time as an excuse - virtually all the AI tool subscriptions for corporations provide per user stats on how much each staff member is using the AI assist. This shouldn't come as a surprise - software tool purveyors need to demonstrate ROI to their customer's management teams and as always this is in reporting tools.

I've already seen several rounds of slacks: "why aren't you using <insert LLM coding assistant name>?" off the back of this reporting.

These assistants essentially spy on you working in many cases, if the subscription is coming from your employer and is not a personal account. For one service, I was able to see full logging of all the chats every employee ever had.


The very second someone starts monitoring me like that I'm out. Let them write their own software.


It's not necessarily just monitoring though. I actively ask that question when I don't see certain keys not being used to inquire their relevancy. Basically taking feedback from some engineers, and generalizing it. Obviously in my case we're doing it in good faith, and assuming people will try to get their work done with whatever tools we give them access to. Like I see Anthropic keys get heavily used among eng department, but I constantly get requests for OpenAI keys for Zapier connects and etc. for business people.


This has been true for every heavily marketed development aid (beneficial or not) for as long as the industry has existed. Managing the politics and the expectations of non-technical management is part of career development.


Yeah, I totally agree, and you're 100% right. But the amount of integrations I've personally done and have instructed my team to do implies this one will be around for a while. At some point spending too much time on code that could be easily generated will be a negative point on your performance.

I've heard exactly the same stories from my friends in larger tech companies as well. Every all hands there's a push for more AI integration, getting staff to use AI tools and etc., with the big expectation that development will get faster.


> At some point spending too much time on code that could be easily generated will be a negative point on your performance.

If we take the premise at face value, then this is a time management question, and that’s a part of pretty much every performance evaluation everywhere. You’re not rewarded for writing some throwaway internal tooling that’s needed ASAP in assembly or with a handcrafted native UI, even if it’s strictly better once done. Instead you bash it out in a day’s worth of Electron shitfuckery and keep the wheels moving, even if it makes you sick.

Hyperbole aside, hopefully the point is clear: better is a business decision as much as a technical one, and if an LLM can (one day) do the 80% of the Pareto distribution, then you’d better be working on the other 20% when management come knocking. If I run a cafe, I need my baristas making coffee when the orders are stacking up, not polishing the machine.

Caveats for critical code, maintenance, technical debt, etc. of course. Good engineers know when to push back, but also, crucially, when it doesn’t serve a purpose to do so.


I don't think AI is an exception. In organizations where there were top-down mandates for Agile, or OOP, or Test-Driven Development, or you-name-it, those who didn't take up the mandate with zeal were likely to find themselves out of favor.


It's not necessarily top down. I genuinely don't know a single person in my organization who doesn't use LLMs one way or another. Obviously with different degrees of applications, but literally everyone does. And we haven't had a real "EVERYONE MUST USE AI!", just people suggesting and asking for specific model usages, access to apps like Cursor and so on.

(I know it because I'm in charge of maintaining all processes around LLM keys, their usages, Cursor stuff and etc.)


> Being very anti-LLM code instead of trying to understand how it can improve the speed might be detrimental for your career.

> I'm in charge of maintaining all processes around LLM keys

Does management look to you for insight on which staffers are appropriately committed to leveraging AI?


No, right now the only thing higher ups ask from me is general percentage usages for different types of model/software usages (Anthropic/OpenAI/Cursor and etc.), so we can reassess subscriptions to cut costs wherever it's needed. But to be fair, they have access to the same dashboards as I do, so if they want to, they can look for it.


> executives, directors and managerial staff think the opposite

The entire reason they hire us is to let them know if what they think makes sense. No one is ideologically opposed to AI generated code. It comes with lots of negatives and caveats that make relying on it costly in ways we can easily show to any executives, directors, etc. who care about the technical feasibility of their feelings.


> No one is ideologically opposed to AI generated code

Unfortunately, that hasn't been my experience. But I agree with you comment generally.


As a former "staff engineer" these executives can go and have their careers and leave people who want to have code they can understand, reason about and focus on quality software well alone.


When IntelliJ was young the autocomplete and automated refactoring were massive game changers. It felt like a dawn of a new age. But then release after release no new refactorings materialized. I don't know if they hit the Pareto limit or the people responsible moved on to new projects.

I think that's the sort of spot where better tools might be appropriate. I know what I want to do, but it's a mess to do it. I suspect that will be better at facilitating growth instead of stunting it.


Hmm… I wonder if there will be a category of LLM-assisted refactoring tools that combine mechanistic transformations with the more flexible capabilities of generative AI. E.g.: update the English text in comments automatically to reflect code structure changes.


Little tools like how to pluralize nouns, convert adjectives to verbs (function takes data and arranges it into a response that the adjective applies to), would help a lot with rename refactors.


What refactoring do you want IntelliJ to do that it can not?


I've seen the exact opposite. Management at my company has been trying to shove AI into everything. They even said that this year we would be dropping all vendors that didn't have some for of AI in their workflow.


I just don't fully understand this position at this level. Personally I know exactly what the next 5 lines need to be, and whether I write them or auto complete or some AI write them doesn't matter. I'll only accept what I had in mind exactly. And with Copilot for boilerplate and relatively trivial tasks that happens pretty often. I feel I'm just saving time / old age joint pain.


If the next 5 lines of code are so predictable, do they really need to be written down?

If you're truly saving time by having an LLM write boiler plate code, is there maybe an opportunity to abstract things away so that higher-level concepts, or more expressive code could be used instead?


Sure, but abstractions have a cost.

5 lines of code written with just the core language and standard library are often much easier to read and digest than a new abstraction or call to some library.

And it’s just an unfortunate fact of life that many of the common programming languages are not terribly ergonomic; it’s not uncommon for even basic operations to require a few lines of boilerplate. That isn’t always bad as languages are balancing many different goals (expressiveness, performance, simplicity and so on).


I have lately been writing a decent amount of Svelte. Svelte and frontend in general is relatively new to me, but since I’ve been programming for a while now I can usually articulate what I want to do in English. LLMs are totally a game changer for me in this scenario - they basically take me from someone who has to look everything up all the time to someone who only does so a couple times a day.

In a way LLMs are ushering in a kind of boilerplate renaissance IMO. When you can have an LLM refactor a massive amount of boilerplate in one fell swoop it starts to not matter much if you repeat yourself - actually, really logically dense code would probably be harder for LLMs to understand and modify (not dissimilar from us…) so it’s even more of a liability now than in the past. I would almost always rather have simple, easy-to-understand code than something elegant and compact and “expressive” - and our tools increasingly favor this too.

Also I really don’t give a shit about how to best center a div nor do I want to memorize a million different markup tags and their 25 years of baggage. I don’t find that kind of knowledge gratifying because it’s more trivia than anything insightful. I’m glad that with LLMs I can minimize the time I spend thinking about those things.


Some languages don't give that opportunity. E.g. the "if err != nil" blocks in Go are effectively required and obvious, but are mandated by the language.

Other things are complicated to abstract for the boilerplate they avoid. The kind of thing that avoids 100 lines of code but causes errors that take 20 minutes to understand because of heavy use of reflection/inferred types in generics/etc. The older I get, the more I think "clever" reflection is more of a sin than boring boilerplate.


What's your stack ? I have the complete opposite experience. LLMs are amazing at writing idiomatic code, less so at dealing with esoteric use cases.

And very often, if the LLM produces a poopoo, asking it to fix it again works just well enough.


> asking it to fix it again works just well enough.

I've yet to encounter any LLM from chatGPT to cursor, that doesn't choke and start to repeat itself and say it changed code when it didn't, or get stuck changing something back and forth repeatedly inside of 10-20 minutes. Like just a handful of exchanges and it's worthless. Are people who make this workflow effective summarizing and creating a fresh prompt every 5 minutes or something?


One of the most important skills to develop when using LLMs is learning how to manage your context. If an LLM starts misbehaving or making repeated mistakes, start a fresh conversation and paste in just the working pieces that are needed to continue.

I estimate a sizable portion of my successful LLM coding sessions included at least a few resets of this nature.


> using LLMs is learning how to manage your context.

This is the most important thing in my opinion. This is why I switched to showing tokens in my chat app.

https://beta.gitsense.com/?chat=b8c4b221-55e5-4ed6-860e-12f0...

I treat tokens like the tachometer for a car's engine. The higher you go, the more gas you will consume, and the greater the chance you will blow up your engine. Different LLMs will have different redlines and the more tokens you have, the more costly every conversation will become and the greater the chance it will just start spitting gibberish.

So far, my redline for all models is 25,000 tokens, but I really do not want to go above 20,000. If I hit 16,000 tokens, I will start to think about summarizing the conversation and starting a new one based on the summary.

The initial token count is also important in my opinion. If you are trying to solve a complex problem that is not well known by the LLM and if you are only starting with 1000 or less tokens, you will almost certainly not get a good answer. I personally think 7,000 to 16,000 is the sweet spot. For most problems, I won't have the LLM generate any code until I reach about 7,000 since it means it has enough files in context to properly take a shot at producing code.


I'm doing ok using the latest Gemini which is (apparently) ok with 1 million tokens.


all that fiddling and copy pasting takes me longer than just writing the code most of the time


And for any project that's been around long enough, you find yourself mostly copy-pasting or searching for the one line you have to edit.


Exactly, while not learning anything along the way.


Only if you assume one is blindly copy/pasting without reading anything, or is already a domain expert. Otherwise you’ve absolutely got the ability to learn from the process, but it’s an active process you’ve got to engage with. Hell, ask questions along the way that interest you, as you would any other teacher. Just verify the important bits of course.


No, learning means failing, scratching your head, banging your head against the wall.

Learning takes time.


I’d agree that’s one definition of learning, but there exists entire subsets of learning that don’t require you to be stuck on a problem. You can pick up simple, and related concepts without first needing to struggle with them. Incrementally building on those moments is as true a form of learning as any other I’d argue. I’d go as far as saying you can also have the moments you’re describing while using an LLM, again with intentionality, not passively.


Hm, I use LLMs almost daily, and I've never had it say it changed code and not do it. If anything, they will sometimes try to "improve" parts of the code I didn't ask them to modify. Most times I don't mind, and if I do, it's usually a quick edit to say "leave that bit alone" and resubmit.

> Are people who make this workflow effective summarizing and creating a fresh prompt every 5 minutes or something?

I work on one small problem at a time, only following up if I need an update or change on the same block of code (or something very relevant). Most conversations are fewer than five prompt/response pairs, usually one-three. If the LLM gets something wrong, I edit my prompt to explain what I want better, or to tell it not to take a specific approach, rather than correcting it in a reply. It gets a little messy otherwise, and the AI starts to trip up on its own past mistakes.

If I move on to a different (sub)task, I start a new conversation. I have a brief overview of my project in the README or some other file and include that in the prompt for more context, along with a tree view of the repository and the file I want edited.

I am not a software engineer and I often need things explained, which I tell the LLM in a custom system prompt. I also include a few additional instructions that suit my workflow, like asking it to tell me if it needs another file or documentation, if it doesn't know something, etc.


Creating a new prompt. Sometimes it can go for a while without, but the first response (with crafted context) is generally the best. Having context from the earlier conversation has its uses though.


The LLM you choose to work with in Cursor makes a big difference, too. I'm a fan of Claude 3.5 Sonnet.


In my experience you have to tell it what to fix. Sometimes how as well.


Simply, it made my last job so nightmarish that for the first time in this career I absolutely dreaded even thinking about the codebase or having to work the next day. We can argue about the principle of it all day, or you can say things like "you are just doing it wrong," but ultimately there is just the boots-on-the-ground experience of it that is going to leave the biggest impression on me, at least. Like it's just so bad to have to work alongside, either the model itself or your coworker with the best of intentions but no domain knowledge.

Its like having to forever be the most miserable detective in the world; no mystery, only clues. A method that never existed, three different types that express the same thing, the cheeky smile of your coworker who says he can turn the whole backend into using an ORM in a day because he has Cursor, the manager who signs off on this, the deranged PR the next day. This continual sense that less and less people even know whats going on anymore...

"Can you make sure we support both Mongo and postgres?"

"Can you put this React component inside this Angular app?"

"Can you setup the kubernetes with docker compose?"


Hiring standards are important, as are managers who get it. Your organization seems to be lacking in both.


> but using code generated from an LLM is pure madness unless what you are building is truly going to be thrown away and rewritten from scratch, as is relying on it as a linting, debugging, or source of truth tool.

That does not match my experience at all. You obviously have to use your brain to review it, but for a lot of problems LLMs produce close to perfect code in record time. It depends a lot on your prompting skills though.


Perhaps I suck at prompting but what I've noticed is that if an LLM has hallucinated something or learned a fake fact, it will use that fact no matter what you say to try to steer it away. The only way to get out of the loop is to know the answer yourself but in that case you wouldn't need an LLM.


I’ve found a good way to get unstuck here is to use another model, either or comparable or superior quality, or interestingly sometimes even a weaker version of the same product (e.g. Claude Haiku, vs. Sonnet*). My mental model here is similar to pair programming or simply bringing in a colleague when you’re stuck.

*I don’t know to what extent it’s worthwhile discussing whether you could call these the same model vs. entirely different, for any two products in the same family. Outside of simply quantising the same model and nothing else. Maybe you could include distillations of a base model too?


The idea of using a smaller version of the same (or a similar) model as a check is interesting. Overfitting is super basic, and tends to be less prominent in systems with fewer parameters. When this works, you may be finding examples of this exact phenomenon.


> The idea of using a smaller version of the same (or a similar) model as a check is interesting.

I built my chat app around this idea and to save money. When it comes to coding, I feel Sonnet 3.5 is still the best but I don't start with it. I tend to use cheaper models in the beginning since it usually takes a few iterations to get to a certain point and I don't want to waste tokens in the process. When I've reached a certain state or if it is clear that the LLM is not helping, I will bring in Sonnet to review things.

Here is an example of how the conversation between models will work.

https://beta.gitsense.com/?chat=bbd69cb2-ffc9-41a3-9bdb-095c...

The reason why this works for my application is, I have a system prompt that includes the following lines:

# Critical Context Information

Your name is {{gs-chat-llm-model}} and the current date and time is {{gs-chat-datetime}}.

When I make an API call, I will replace the template strings with the model and date. I also made sure to include instructions in the first user message to let the model know it needs to sign off on each message. So with the system prompt and message signature, you can say "what do you think of <LLM's> response".


I would say prompting skills relative coding skills; and the more you rely on them, the less you learn.


That is not my experience. I wrote recently [1] about how I use it and it’s more like an intern, pair programmer or rubber duck. None of which make you worse.

[1]: https://lucumr.pocoo.org/2025/1/30/how-i-ai/


> it’s more like an intern, pair programmer or rubber duck. None of which make you worse.

Are you sure? I've definitely had cases where an inexperienced pair programmer made my code worse.


That’s a different question. But you don’t learn less.


Of course you do, that's why school isn't just the teacher giving you the answers, you have to work for it.


It's helpful to view working solutions and quality code as separate things to the LLM.

* If you ask it to solve a problem and nothing more, chances are the code isn't the best as it will default to the most common solutions in the training data.

* If you ask it to refactor some code idiomatically, it will apply most common idiomatic concepts found in the training data.

* If you ask it to do both at the same time you're more likely to get higher quality but incorrect code.

It's better to get a working solution first, then ask it to improve that solution, rinse/repeat in smallish chunks of 50-100 loc at a time. This is kinda why reasoning models are of some benefit, as they allow a certain amount of reflection to tie together disparate portions of the training data into more cohesive, higher quality responses.


It isn't like you can't write tests or reason about the code, iterate on it manually, just because it is generated. You can also give examples of idioms or patterns you would like to follow. It isn't perfect, and I agree that writing code is the best way to build a mental model, but writing code doesn't guarantee intuition either. I have written spaghetti that I could not hope to explain many times, especially when exploring or working in a domain that I am unfamiliar with.


I described how I liked doing ping-pong pairing TDD with Cursor elsewhere. One of the benefits of that approach is that I write at least half the implementation and tests and review every single line. That means that there is always code that follows the patterns I want and it's right there for the LLM to see and base its work on.

Edit: fix typo in last sentence


i love when the llm can be its work of


Ugh, sorry for the typo. That was supposed to be "can base its work on"


I've had exactly the opposite experience with generating idiomatic code. I find that the models have a lot of information on the standard idioms of a particular language. If I'm having to write in a language I'm new in, I find it very useful to have the LLM do an idiomatic rewrite. I learn a lot and it helps me to get up to speed more quickly.


I wonder if there is a big disconnect partially due to the fact that people are talking about different models. The top tier coding models (sonnet, o1, deepseek) are all pretty good, but it requires paid subscriptions to make use of them or 400GB of local memory to run deepseek.

All the other distilled models and qwen coder and similar are a large step below the above models in terms of most benchmarks. If someone is running a small 20GB model locally, they will not have the same experience as those who run the top of the line models.


The top of the line models are really cheap though. Getting an anthropic key and $5 of credit costs you exactly that, and gives you hundreds of prompts.


LLMs can work if you program above the code.

You still need to state your assertions with precision and keep a model of the code in your head.

Its possible to be be precise at an higher level of abstraction as long as your prompts are consistent with a coherent model of the code.


> Its possible to be be precise at an higher level of abstraction as long as your prompts are consistent with a coherent model of the code.

This is a fantastic quote and I will use this. I describe the future of coding as natural language coding (or maybe syntax agnostic coding). This does not mean that the llm is a magic machine that understands all my business logic. It means what you've described - I can describe my function flow in abstracted english rather than requiring adherence to a syntax


i’ve had some luck with asking conceptual questions about how something works if i am using library X with protocol Y. i usually get an answer that is either actually useful or at least gets me on the right path of what the answer should be. for code though, it will tell me to use non existent apis from that library to implement things


The counterargument that I hear is that since writing code is now so easy and cheap, there is no need to write pretty code that generalizes well. Just have the llm write a crappy version and the necessary tests, and once your requirements change you just toss everything and start fresh.


Code written from an LLM is really really good if done right, i.e. reviewing every line of code as it comes out and prompt guiding it in the right direction.

If youre getting junior devs just pooping out code and sending to review thats really bad and should be a pip-able offense in my opinion.


I’ve iterated on 1k lines of react slop in 4h the other day, changed table components twice, handled errors, loading widgets, modals, you name it. It’d take me a couple days easily to get maybe 80% of that done.

The result works ok, nobody cares if the code is good or bad. If it’s bad and there are bugs, doesn’t matter, no humans will look at it anymore - Claude will remix the slop until it works or a new model will rewrite the whole thing from scratch.

Realized during writing this that I should’ve added the extract of requirements in the comment of the index.ts of the package, or maybe a README.CURSOR.md.


My experience having Claude 3.5 Sonnet or Google Gemini 2.0 Exp-12-06 rewrite a complex function is that it slowly introduces slippage of the original intention behind the code, and the more rewrites or refactoring, the more likely it is to do something other than what was originally intended.

At the absolute minimum this should require including a highly detailed function specification in the prompt context and sending the output to a full unit test suite.


> Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away.

Lordy. Is this where software development is going over the next few years?


In that case we can look forward to literally nothing of any complexity or reliability being produced.


It's actually where we have been the whole time.


I'd pay to review one of your PRs. Maybe a consistent one with ai usage proof.


Would be great comedic relief for sure since I’m mostly working in the backend mines, where the LLM-friendly boilerplate is harder to come by admittedly.

My defense is that Karpathy does the same thing, admitted himself in a tweet https://x.com/karpathy/status/1886192184808149383 - I know exactly what he means by this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: