Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In my experience LLMs will help you with things that have been solved thousands of times before and are just a matter of finding some easily researched solution.

The very moment when you try to go off the beaten path and do something unconventional or stuff that most people won't have written a lot about, it gets more tricky. Just consider how many people will know how to configure some middleware in a Node.js project... vs most things related to hardware or low level work. Or even working with complex legacy codebases that have bits of code with obscure ways of interacting and more levels of abstraction that can be reasonably put in context.

Then again, if an LLM gets confused, then a person might as well. So, personally I try to write code that'd be understandable by juniors and LLMs alike.



In my experience, a LLM decided to not know type alignment rules in C and confidently trotted out the wrong answer. It left a horrible taste in my mouth for the one time I decided to look at using a LLM for anything and it keeps leaving me wondering if I'd end up more time bashing the LLM into working than just working out the answer myself and learning the underlying reasoning why.

It was so wrong that I wonder what version of the C standard it was even hallucinating.


> vs most things related to hardware or low level work.

counter point:

https://github.com/ggerganov/llama.cpp/pull/11453

> This PR provides a big jump in speed for WASM by leveraging SIMD instructions for qX_K_q8_K and qX_0_q8_0 dot product functions.

> Surprisingly, 99% of the code in this PR is written by DeekSeek-R1. The only thing I do is to develop tests and write prompts (with some trials and errors)


A single PR doesn't really "prove" anything. Optimization passes on well-tested narrowly scoped code are something that LLMs are already pretty good at.


I think DeekThink is something different though.

It is able to figure out some things that I know do not have much training data at all.

It is looking at the manual and figuring things out. "That doesn't make sense. Wait, that can't be right. I must have the formula wrong."

I just seen that in the chain of thought.


Nah, in my experience, if there is the slightest error in the first sentence of the chain of thought, it tends to get worse and worse. I've had prompts that would generate a reasonable response in llama, but turn out utter garbage in Deepthink.


But how is this any different from real humans? They are not always right either. Sure, humans can understand things better, but are we really going to act like LLMs can't get better in the next year? And what about the next 6 months? I bet there are unknown startups like Deepseek that can push the frontier further.


The ways in which humans err are very different. You have a sense of your own knowledge on a topic and if you start to stray from what you know you're aware of it. Sure, you can lie about it but you have inherent confidence levels in what you're doing.

Sure, LLMs can improve but they're ultimately still bound by the constraints of the type of data they're trained on and don't actually build world models through a combination of high bandwidth exploratory training (like humans) and repeated causal inference.


at a certain point though, one wonders if you can trust people to accurately report how much is written by an LLM. (not even implying bad faith, but if you're constantly re-reading, selecting and re-combining snippets written by LLMs, it's not really "written" by LLMs in the same way that's implied).


We kinda went through this with images when Photoshop and similar tools appeared. I remember a lot of people asking questions in the late 90s/early 00s in particular about if an image were “real” or not and the distinctions between smart photography and digital compositions. Nowadays we just assume everyone is using such tools as a baseline and genuinely clever photography is now celebrated as an exception. Perhaps ditto with CGI and prop/set making in movies. Unless a director crows about how genuine the effects are, we assume CGI.


Yeah I never know exactly what this means. The pr says for one variant it got in one shot and the other they said took re-prompting 4 to 8 more times.


> at a certain point though, one wonders if you can trust people to accurately report how much is written by an LLM.

That's an interesting thought. I think there are ways to automate this, and some IDEs / tools track this already. I've seen posts by both Google and Amz providing percentages of "accepted" completions in their codebases, and that's probably something they track across codebases automatically.

Also on topic, here's aider's "self written code" statistics: https://aider.chat/HISTORY.html

But yeah I agree that "written by" doesn't necessarily imply "autonomously", and for the moment it's likely heavily curated by a human. And that's still ok, IMO.


I use CoPilot pretty much as a smarter autocomplete that can sometimes guess what I'm planning to type next. I find it's not so good at answering prompts, but if I type:

    r = (rgba >> 24) & 0xff;
...and then pause, it's pretty good at guessing:

    g = (rgba >> 16) & 0xff;
    b = (rgba >> 8) & 0xff;
    a = rgba & 0xff;
... for the next few lines. I don't really ask it to do more heavy lifting than that sort of thing. Certainly nothing like "Write this full app for me with these requirements [...]"


LLMs are surprisingly good at Haskell (and I'm not).

I hope for a rennaisance of somewhat more rigorous programming languages: you can typecheck the LLM suggestions to see if they're any good. Also you can feed the type errors back to the LLM.


Brilliant.

I have fought the "lowest cognitive load" code-style fight forever at my current gig, and I just keep losing to the "watch this!" fancytowne code that mids love to merge in. They DO outnumber me, so... fair deuce I suppose.

There is value in code being readable by Juniors and LLMs -- hell, this Senior doesn't want to spend time figuring out your decorators, needless abstractions and syntax masturbation. I just want to fix a bug and get on with my day.


While I think this comment got flagged (probably for the way it was worded), you aren't wrong! A good way I've heard a similar thought expressed is that code should be not only easy to maintain, but also throw away and replace, which more or less urges you towards writing the easiest things you can get away with (given your particular performance requirements and other constraints).


I've only started using LLMs for code recently, and I already tend to mentally translate what I want to something that I imagine is 'more commonly done and well represented in the training data'.

But especially the ability to just see some of the stuff it produces, and now to see its thought process, is incredibly useful to me already. I do have autism and possibly ADD though.


That rhymes with my experience of trying to generate placeholder art using AI.

Since it's just a placeholder I often ask for a funny twist but it's rarely ever anything like it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: