I’m not saying the LLM will give a good confidence value, maybe it will maybe it...

hexaga · 2026-01-18T18:49:15 1768762155

Because it's not calibrated to. In LLMs, next token probabilities are calibrated: the training loss drives it to be accurate. Likewise in typical classification models for images or w/e else. It's not beyond possibility to train a model to give confidence values.

But the second-order 'confidence as a symbolic sequence in the stream' is only (very) vaguely tied to this. Numbers-as-symbols are of different kind to numbers-as-next-token-probabilities. I don't doubt there is _some_ relation, but it's too much inferential distance away and thus worth almost nothing.

With that said, nothing really stops you from finetuning an LLM to produce accurately calibrated confidence values as symbols in the token stream. But you have to actually do that, it doesn't come for free by default.

kelipso · 2026-01-19T00:46:29 1768783589

Yeah, I agree you should be able to train it to output confidence values, especially integers from 0 to 9 for confidence should make it so it won’t be as confused.

bob1029 · 2026-01-18T17:15:05 1768756505

CNNs and LLMs are fundamentally different architectures. LLMs do not operate on images directly. They need to be transformed into something that can ultimately be fed in as tokens. The ability to produce a confidence figure isn't possible until we've reached the end of the pipeline and the vision encoder has already done its job.

kelipso · 2026-01-19T00:44:38 1768783478

The images get converted to tokens using the vision encoder, But the tokens are just embedding vectors. So it should be able to if you train it.

CNNs and LLMs are not that different. You can train an LLM architecture to do the same thing that CNNs do with a few modifications, see Vision Transformers.