Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"The LLMs have no ground truth" claim (around chapter 2) that's core to the "bullshit machines" argument is itself wrong. Of course LLMs have ground truth. What do the authors think here, that the text in training corpus is random?

Hint: it isn't. Real conversations are anything but random. There's a lot of information hidden in "statistical ordering of the words", because the distribution is not arbitrary.

Statistical ground truth isn't any worse than explicitly given one. In fact, fundamentally, there only ever is statistical certainty. Realizing it is a pretty profound insight, and you'd think it should be table stakes at least in STEM, but sonehow it fails to spread as much as it should.



Statistical truths based on observation of reality is the basis for science. Statistical truth based on the text on the internet is the basis for something else, and I would personally not like to call whatever that is science, or any truths established this way "ground truths".


Statistical truths based on observation of reality is the basis for incremental additions science. Statistical truths based on what you read in the textbook is what actually is almost all science, to everyone, at almost all times.

There's few things in our lives any of us actually learned first-hand, empirically. Everything else we learned the same way LLMs did - as what is expected to be the right completion to a question or inquiry.

The objective truth as we experience is things that, were we to make a prediction conditioned on them, our prediction would turn out correct. That doesn't mean we actually make such predictions often, or that we ever made such predictions learning it.


What year did the Normans invade England?

What's Newton's second law?

Who was the last czar of Russia?

How many moons does Jupiter have?

I bet you "know" a lot of those facts not because you have observed them empirically, but because you read about them in books. And in fact, almost all scientists rely on reading for nearly everything they know about science, including nearly everything they know about their own specialties. Nobody has time to derive all of their knowledge of the world from personal observation, and most people who can read have probably learned almost all "facts" they know about the world from books.


Yup, and importantly, the correct answer to those is, in anyone's personal learning experience, almost always just what the authority figures in their lives (parents, teachers, peers they respect, or textbooks themselves) would accept as the correct answer!

"Consensus reality" works so well for us most of the time, that we habitually "substitite the Moon for the finger pointing at the Moon" without realizing it.


There's obviously nothing wrong with learning by reading, but the way you tell whether what you read is true is by seeing whether or not it fits in with observation of reality. That's the reason we're no longer reading the books about phlogiston.


> the way you tell whether what you read is true is by seeing whether or not it fits in with observation of reality

The only way any of us ever gets to see "whether or not it fits in with observation of reality" is to see if they get an A or F on the test asking it.

Seriously.

The "moons of Jupiter" question is the only one of the above one gets to connect to an observation independent of humans, and then if they did, they'd be wrong, because you can't just count all the moons of Jupiter from your backyard with a DIY telescope. We know the correct answer only because some other people both built building-sized telescopes and had a bunch of car-sized telescopes thrown at Jupiter - and unless you had a chance to operate either, then for you the "correct answer" is what you read somewhere and that you expect other people to consider correct - this is the only criterion you have available.


Independently checking the information you read in textbooks is very difficult for sure. But it's still how we decide what's true and what's not true. If tomorrow a new moon was somehow orbiting Jupiter we'd say the textbooks were wrong, we wouldn't say the moon isn't there.


What? That's (1) not true and (2) says, uh, a lot of unintentional things about the way you approach the world. I'm not sure you realize quite how it makes you look.

For one, it's not even internally consistent -- the people who built telescopes and satellites didn't "see" the moons, either. They got back a bunch of electrical signals and interpreted it to mean something. This worldview essentially boils down to the old "brain in a jar" which is fun to think about at 3am when you're 21 and stoned, but it's not otherwise useful so we discard it.

For another, "how many moons does Jupiter have" doesn't have a correct answer, because it doesn't have an answer. There is no objective definition of what a "moon" is. There's not even a precise IAU technical definition. Jupiter has rings that are constantly changing, every single particle of those could be considered a moon if you want to get pedantic enough.

I'm always a bit shocked and disappointed with people when they go "well, you learn it on a test and that's how you know" because ...no, no that's not at all how it works. The most essential part of learning is knowing how we know and knowing how certain we are in that conclusion.

"Jupiter has 95 moons" is not a useful or true fact. "Jupiter has 95 named moons and thousands of smaller objects orbiting it and the International Astronomical Union has decided it's not naming any more of them." is both useful and predictive [0] because you know there isn't going to be any more unless something really wild happens.

[0] https://science.nasa.gov/jupiter/jupiter-moons/


> I'm not sure you realize quite how it makes you look.

I probably don't.

> For one, it's not even internally consistent -- the people who built telescopes and satellites didn't "see" the moons, either. They got back a bunch of electrical signals and interpreted it to mean something.

I'm not trying to go 100% reductionist here; I thought the point was clear. I was locking on the distinction between "learn from experience" vs. "learn from reading about it", and corresponding distinction of "test by predicting X" vs. "test by predicting other peoples' reactions to statements about X" - because that's the distinction TFA assumes we're on the "left side" of, LLMs on the "right", and I'm saying humans are actually on the same side as LLMs.

> This worldview essentially boils down to the old "brain in a jar" which is fun to think about at 3am when you're 21 and stoned, but it's not otherwise useful so we discard it.

Wait, what's wrong with this view? Wasn't exactly refuted in any way, despite proclamations by the more "embodied cognition" folks, whose beliefs are to me just a religion trying to retroactively fit to modern science to counter diminishing role of human soul at the center of it.

> I'm always a bit shocked and disappointed with people when they go "well, you learn it on a test and that's how you know" because ...no, no that's not at all how it works. The most essential part of learning is knowing how we know and knowing how certain we are in that conclusion.

My argument isn't simply "learn for the test and it's fine". I was myself the person who refused to learn "for the test" - but that doesn't change the fact that, in 99% of the cases, what I was doing is anticipating reaction of people (imaginary or otherwise) who hold accurate beliefs around the world, because it's not like I was able to test any of it empirically myself. And no, internal belief consistency is still text land, not hard empirical evidence land.

My point is to highlight that, for most of what we call today knowledge, which isn't tied to directly experiencing a phenomena in question, we're not learning in ways fundamentally different to what LLMs are doing. This isn't to say here that LLMs are learning it well or understanding it (for whatever one means by "understanding") - just that the whole line of arguing that "LLMs only learn from statistical patterns in text, unlike us, therefore can't understand" is wrong because 1) statistical patterns in text contain contain that knowledge, and 2) it's what we're learning it from as well.


> Wait, what's wrong with this view? Wasn't exactly refuted in any way, despite proclamations by the more "embodied cognition" folks, whose beliefs are to me just a religion trying to retroactively fit to modern science to counter diminishing role of human soul at the center of it.

It's unfalsifiable, that's what's wrong with it. Sure, you could be a brain in a jar experiencing a simulated world, but there's nothing useful about that worldview. If the world feels real, you might as well treat it like it is.

> My point is to highlight that, for most of what we call today knowledge, which isn't tied to directly experiencing a phenomena in question, we're not learning in ways fundamentally different to what LLMs are doing

I get what you're trying to say -- nobody can derive everything from first principles, which is true -- but your conclusion is absolutely not true. Humans don't credulously accept what we're given in a true/false binary and spit out derived facts.

All knowledge is an approximation. There is very little absolute truth. And we're good at dealing with that.

Humans learn by building up mental models of how systems work, understanding when those models apply and when they don't, understanding how much they can trust the model and understanding how to test conclusions if they aren't sure.

LLMs can't do any of that.


> personally not like to call whatever that is science

A counterargument written pre-LLMs: https://arxiv.org/abs/1104.5466


It's true that LLMs aren't trained on strings of random words, so in a sense you correct are that they have some "ground truth." They wouldn't generate anything logical at all if not. Does that even need to be stated though? You don't need AI to generate random words.

The more important point is, they aren't trained on only factual (or statistically certain) statements. That's the ground truth that's missing. It's easy to feed an LLM a bunch of text scraped from the internet. It's much harder to teach it how to separate fact from fiction. Even the best human minds that live or ever have lived have not been able to do that flawlessly. We've created machines that have a larger amount of memory than any human, much quicker recall, the ability to converse with vast numbers of people at once, but it performs at about par with humans in discerning fact from fiction.

That's my biggest concern about creating super powered artificial intelligence. It's super powers are only super in a couple dimensions and people mistake that for general intelligence. I came across someone online that really believed chatGPT was creating a custom diet plan tailored to their specific health needs, base on a few prompts. That is scary!


> Statistical ground truth isn't any worse than explicitly given one

There are multiple kinds of truths.

'Statistical truth' is at best 'consensus truth', and that's only when LLM doesn't hallucinate.


That's the only one that's available, though.

When a kid at school is being taught, say, Newton's laws of motion, or what happened in 476 CE, they're not experiencing the empirical truth about either. They're only learning the consensus truth, i.e. the correct answer to give to the teacher, so they get good grade instead of bad grade, and so their parents praise them instead of punishing them, etc.

This covers pretty much everything any human ever learns. Few are in position to learn any particular things experimentally. Few are in position to verify most of what they've learned experimentally afterwards.

We live in a "consensus reality", but that works out fine, because establishing consensus is hard, and one of the strongest consensus-forcing forces that exist is "close enough to actual reality".


I've heard about at least 4 theories of truth: Correspondence, Coherence, Consensus and Pragmatic (as described, for example, here https://commoncog.com/four-theories-of-truth/).

If we look at Newtonian mechanics, then various independently verifiable experiments are examples of Correspondence truth, and the minimal mathematical framework that describes them is an example of Coherence truth.


Fine, but it's not how any of us learned of it either - whether the Newtonian mechanics or the "4 theories of truth".

I mean, coherence is sure a an important aspect of truth, and just by paying attention whether it all "adds up" you can easily filter 90% of the bullshit you hear people (or LLMs for that matter) saying - but even there, I'm not a physicist, I don't do much experiments in a lab, so when I evaluate if some information is coherent with Newton's laws of motion, I'm actually evaluating some description of a situation against a description of Newton's laws. It's all done in "consensus space" and, if an answer is expected, the answer is also a "consensus space" one.

We're all so used to evaluating inputs and outputs through the lens of "is this something I expect others believe, and others expect me to believe", that we're almost always just mentally folding the indirection through "consensus reality" and feel like we're just checking "is this true". It works out okay, and it can't really be any other way - but we need to remember this is what we're doing.


Ground truth is possibly used in the sense that humans’ brains tie what they create to the properties of observed reality. Whatever new information comes in is compared to, or checked by, that. Whereas, LLM’s will believe anything you feed them in training no matter how unrealistic it is.

I do think that, after much training data, they do have specific beliefs that are ingrained in them. That changing those is difficult. We’ve seen that on some political and scientific claims that must have been prominent in their pre-training data or RLHF tuning. They will argue with us over those points, like it’s a fight. Otherwise, I’ve seen continued pre-training or fine-tuning can change everything up to their vocabularies.


> Ground truth is possibly used in the sense that humans’ brains tie what they create to the properties of observed reality. Whatever new information comes in is compared to, or checked by, that.

What do you compare your knowledge of history to, other than what you expect other people will say? Most knowledge we learn in life is tied only to our expectations of other people's reactions. Which works out fine, most of the time, because even with questions of scientific fact, it's enough some people are in position to ground information in empirical experiments, and then set everyone else's expectations based on that.


To start with, we know what a person is, rudimentary things about how they behave, our senses, how they commonly work, and can do mental comparisons (reality checks).

We know LLM’s don’t start with that because we initialize them with zero’d or random weights.

Then, their training data can be far more made up, even works of fiction, that the reality most humans observe with is almost always real. We could raise a human in VR or something where technically there would be comparisons. Most humans’ observations connect to expectations in their brain which was designed for the reality we operate in.

Finally, the brain has multiple components that each handle different jobs. They have different architectures to do those jobs well. Sections include language, numbers, reasoning, tiers of memory, hallucination prevention, mirroring, and even meta-level stuff like reward adjustment. We don’t just speculate that they do different things: damage to those regions shuts down those abilities. Tied to the physical realm are vision, motor, and spatial areas. We can feel objects, even temperature or pressure changes. That we can do a lot of that without supervised learning shows we’re tailor-made by God for success in this world.

LLM’s have one architecture that does one job which we try to get to do other things, like reasoning or ground truth. We pretend it’s something it’s not. The multimodal LLM’s are getting closer with specialized components. Even they aren’t all trained in a coherent way using real-world, observations in the senses. There’s usually a gap between systems like these and what the brain does in the real world just in how it gets reliable information about its operating environment.


> To start with, we know what a person is, rudimentary things about how they behave, our senses, how they commonly work, and can do mental comparisons (reality checks).

How much is this a matter of fidelity? LLMs started with text, now text + vision + sound; it's still not the full package relative to what humans sport, but it captures a good chunk of information.

Now, I'm not claiming equivalence in the training process here, but let's remember that we all spend the first year or two of our lives just figuring out the intuitive basics of "what a person is, rudimentary things about how they behave, our senses, how they commonly work", and from there, we spend the next couple years learning more explicit and complex aspects of the same. We don't start with any of it hardcoded (and what little we have, it's been bestowed to us by millennia of a much slower gradient-descent process - evolution).

> LLM’s have one architecture that does one job which we try to get to do other things, like reasoning or ground truth.

FWIW, LLMs have one architecture in a similar sense brain has one architecture - brains specialize as they grow. We know that parts of a brain are happy to pick up the slack for differently specialized parts that became damaged or unavailable.

LLMs aren't uniform blobs, either. Now, their architecture is still limited - for one, unlike our brains, they don't learn on-line - they get pre-trained and remain fixed for inference. How much a model capable of on-line learning will differ structurally from current LLMs, or even the naive approach to bestow learning ability on LLMs (i.e. do a little evaluation and training after every conversation)? We don't know yet.

I'm definitely not arguing LLMs of today are structurally or functionally equivalent to humans. But I am arguing that learning from sum total of the Internet isn't meaningfully different from how humans learn, at least for anything that we'd consider part of living in a technological society. I.e. LLMs don't get to experience throwing rocks first-hand like we do, but neither of us get to experience special relativity.

> Even they aren’t all trained in a coherent way using real-world, observations in the senses.

Neither them nor us. I think if there's one insight people should've gotten from the past couple years is that "mostly coherent" data is fine (particularly if any given subset is internally coherent, even if there's little coherence between different subsets) - both humans and LLMs can find larger coherence if you give them enough such data.


So if I ask ChatGpt about bears and in the middle of explaining their diet it tells me something about how much they like porridge and in the middle of habitat it tells me they live in a quaint cabin in the woods, that's ... True?

Statistically we certainly have a lot of words about 3 bears and their love for porridge. That doesn't mean it's true, it just means it's statistically significant. If I asked someone a scientific question about bears and they told me about Goldilocks, id think it was bullshit.


If that were the case, then you are right.. However, the current crop of LLMs seem to be good at understanding the context.

A scientific data point about bears is unlikely to have Goldilocks in there (unless talking about evolution of life and Goldilocks zone). You can argue that there is meaning hidden in words that is not captured by words themselves in a given context - psychic knowledge as opposed to reasoned out knowledge. That is a philosophical debate.


Words don't carry meaning. Meaning exists in how words are or are not used together with other words. That is, in their.... statistical relationships to each other!


LLM's demonstrably don't do this, nor do they say that they live in the hundred acre woods and love honey. Unless you ask about a specific bear.


ChatGPT has enough dimensions in its latent space to represent and distinguish between the various meanings of porridge and is able to be informed by the Goldilocks story without switching to it mid-sentence.

It's actually a good example of what I have in mind by saying human text isn't random. The Goldilocks story may not be scientific, but it's still highly correlated with scientific truth about matters like food, bears, or the daily lives of people. Put yourself in the shoes of an alien trying to make heads or tails of that story, you'll see just how many things in it are not arbitrary.


Having a ground truth doesn't mean it does not make huge and glaring mistakes.


>Of course LLMs have ground truth.

It is my understanding that LLMs have no such thing, as empiric truth is weighted. For example, if Newton's laws are in conflict with another fact, the LLM will defer to the fact that it finds more probable in context. It will then require human resources to undo and unfold it's core error, else you receive bewildering and untrue remarks or outputs.


> For example, if Newton's laws are in conflict with another fact, the LLM will defer to the fact that it finds more probable in context.

Which is the correct thing to do. If such a context would be, for example, an explanation of an FTL drive in a science fiction story, both LLMs and humans would be correct to put Newton aside.

LLMs aren't Markov chains, they don't output naive word frequency based predictions. They build a high-dimensional statistical representation of entirety of their training data, from which then completions are sampled. We already know this representation is able to identify and encode ideas as diverse as "odd/even" and "fun" and "formal language" and "golden gate bridge"-like. "Fictional context" vs. "Real-life physics" is a concept they can distinguish too, just like people can.


Where "probable" means: occurs the most often in the training data (approx the entire Internet). So what is common online is most likely to win out, not some other notion of correctness.


Well, duh ... of course there are statistical regularities in the data, which is what LLMs learn. However, the LLM has no first hand experience of the world described by those words, so the words (to an LLM) are not grounded.

So, the LLM is doomed to only be able to predict patterns in the dataset, with no knowledge or care as to whether what it is outputting is objectively true or not. Calling it bullshitting would be an anthromorphism since it implies intention, but it's effectively what an LLM does - just spits out words without any care (or knowledge) as to the truth of them.

Of course if there is more truth than misinformation and lies in the dataset, then statistically that is more likely to be output by an LLM, but to the LLM it's all the same - just word stats, just the same as when it "hallucinates" and "makes something up", since there is always a non-zero possibility that the next word could be ... anything.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: