In my experience LLMs will help you with things that have been solved thousands ...

winocm · on Jan 27, 2025

In my experience, a LLM decided to not know type alignment rules in C and confidently trotted out the wrong answer. It left a horrible taste in my mouth for the one time I decided to look at using a LLM for anything and it keeps leaving me wondering if I'd end up more time bashing the LLM into working than just working out the answer myself and learning the underlying reasoning why.

It was so wrong that I wonder what version of the C standard it was even hallucinating.

NitpickLawyer · on Jan 27, 2025

> vs most things related to hardware or low level work.

counter point:

https://github.com/ggerganov/llama.cpp/pull/11453

> This PR provides a big jump in speed for WASM by leveraging SIMD instructions for qX_K_q8_K and qX_0_q8_0 dot product functions.

> Surprisingly, 99% of the code in this PR is written by DeekSeek-R1. The only thing I do is to develop tests and write prompts (with some trials and errors)

alfalfasprout · on Jan 27, 2025

A single PR doesn't really "prove" anything. Optimization passes on well-tested narrowly scoped code are something that LLMs are already pretty good at.

dutchbookmaker · on Jan 27, 2025

I think DeekThink is something different though.

It is able to figure out some things that I know do not have much training data at all.

It is looking at the manual and figuring things out. "That doesn't make sense. Wait, that can't be right. I must have the formula wrong."

I just seen that in the chain of thought.

9cb14c1ec0 · on Jan 28, 2025

Nah, in my experience, if there is the slightest error in the first sentence of the chain of thought, it tends to get worse and worse. I've had prompts that would generate a reasonable response in llama, but turn out utter garbage in Deepthink.

tonyhart7 · on Jan 28, 2025

But how is this any different from real humans? They are not always right either. Sure, humans can understand things better, but are we really going to act like LLMs can't get better in the next year? And what about the next 6 months? I bet there are unknown startups like Deepseek that can push the frontier further.

alfalfasprout · on Jan 28, 2025

The ways in which humans err are very different. You have a sense of your own knowledge on a topic and if you start to stray from what you know you're aware of it. Sure, you can lie about it but you have inherent confidence levels in what you're doing.

Sure, LLMs can improve but they're ultimately still bound by the constraints of the type of data they're trained on and don't actually build world models through a combination of high bandwidth exploratory training (like humans) and repeated causal inference.

suddenlybananas · on Jan 27, 2025

at a certain point though, one wonders if you can trust people to accurately report how much is written by an LLM. (not even implying bad faith, but if you're constantly re-reading, selecting and re-combining snippets written by LLMs, it's not really "written" by LLMs in the same way that's implied).

petercooper · on Jan 28, 2025

We kinda went through this with images when Photoshop and similar tools appeared. I remember a lot of people asking questions in the late 90s/early 00s in particular about if an image were “real” or not and the distinctions between smart photography and digital compositions. Nowadays we just assume everyone is using such tools as a baseline and genuinely clever photography is now celebrated as an exception. Perhaps ditto with CGI and prop/set making in movies. Unless a director crows about how genuine the effects are, we assume CGI.

kitchenchem · on Jan 28, 2025

Yeah I never know exactly what this means. The pr says for one variant it got in one shot and the other they said took re-prompting 4 to 8 more times.

NitpickLawyer · on Jan 28, 2025

> at a certain point though, one wonders if you can trust people to accurately report how much is written by an LLM.

That's an interesting thought. I think there are ways to automate this, and some IDEs / tools track this already. I've seen posts by both Google and Amz providing percentages of "accepted" completions in their codebases, and that's probably something they track across codebases automatically.

Also on topic, here's aider's "self written code" statistics: https://aider.chat/HISTORY.html

But yeah I agree that "written by" doesn't necessarily imply "autonomously", and for the moment it's likely heavily curated by a human. And that's still ok, IMO.

ryandrake · on Jan 27, 2025

I use CoPilot pretty much as a smarter autocomplete that can sometimes guess what I'm planning to type next. I find it's not so good at answering prompts, but if I type:

    r = (rgba >> 24) & 0xff;

...and then pause, it's pretty good at guessing:

    g = (rgba >> 16) & 0xff;
    b = (rgba >> 8) & 0xff;
    a = rgba & 0xff;

... for the next few lines. I don't really ask it to do more heavy lifting than that sort of thing. Certainly nothing like "Write this full app for me with these requirements [...]"

tasuki · on Jan 28, 2025

LLMs are surprisingly good at Haskell (and I'm not).

I hope for a rennaisance of somewhat more rigorous programming languages: you can typecheck the LLM suggestions to see if they're any good. Also you can feed the type errors back to the LLM.

schmookeeg · on Jan 27, 2025

Brilliant.

I have fought the "lowest cognitive load" code-style fight forever at my current gig, and I just keep losing to the "watch this!" fancytowne code that mids love to merge in. They DO outnumber me, so... fair deuce I suppose.

There is value in code being readable by Juniors and LLMs -- hell, this Senior doesn't want to spend time figuring out your decorators, needless abstractions and syntax masturbation. I just want to fix a bug and get on with my day.

KronisLV · on Jan 28, 2025

While I think this comment got flagged (probably for the way it was worded), you aren't wrong! A good way I've heard a similar thought expressed is that code should be not only easy to maintain, but also throw away and replace, which more or less urges you towards writing the easiest things you can get away with (given your particular performance requirements and other constraints).

mercer · on Jan 28, 2025

I've only started using LLMs for code recently, and I already tend to mentally translate what I want to something that I imagine is 'more commonly done and well represented in the training data'.

But especially the ability to just see some of the stuff it produces, and now to see its thought process, is incredibly useful to me already. I do have autism and possibly ADD though.

rooroobooragool · on Jan 28, 2025

That rhymes with my experience of trying to generate placeholder art using AI.

Since it's just a placeholder I often ask for a funny twist but it's rarely ever anything like it.