If it's mimicry of reason is indistinguishable from real reasoning, how is it not reasoning?
Ultimately, an LLM models language and the process behind it's creation to some degree of accuracy or another. If that model includes a way to approximate the act of reasoning, then it is reasoning to some extent. The extent I am happy to agree is open for discussion, but that reasoning is taking place at all is a little harder to attack.
No, it is distinguishable from real reasoning. Real reasoning, while flawed in various ways, goes through personal experience of the evaluator. LLMs don't have that capability at all. They're just sifting though tokens and associate statistical parameters to it with no skin in the game so to speak.
LLM's have personal option by virtue of the fact they make statements of things they understand to the extent their training data allows. Their training data is not perfect, and in addition, through random chance the LLM will latch onto specific topics as a function of weight initialization and training data order.
This would form a filter not unlike, yet distinct from, our understanding of personal experience.
you could make the exact same argument against humans, we just learn to make sounds that elicit favourable responses. Besides, they have plenty "skin in the game", about the same as you or I.
It seems like an arbitrary distinction. If an LLM can accomplish a task that we’d all agree requires reasoning for a human to do, we can’t call that reasoning just because the mechanics are a bit different?
Yes because it isn't an arbitrary distinction. My good old TI-83 can do calculations that I can't even do in my head but unlike me it isn't reasoning about them, that's actually why it's able to do them so fast, and it has some pretty big implications about what it can't do.
If you want to understand where a systems limitations are you need to understand not just what it does but how it does it, I feel like we need to start teaching classes on Behaviorism again.
An LLM’s mechanics are algorithmically much closer to the human brain (which the LLM is modeled on) than a TI-83, a CPU, or any other Turing machine. Which is why, like the brain, it can solve problems that no individual Turing machine can.
Are you sure you aren’t just defining reasoning as something only a human can do?
My prior is reasoning is a conscious activity. There is a first person perspective. LLMs are so far removed mechanically from brains the idea they reason is not even remotely worth considering. Modeling neurons can be done with a series of pipes and flowing water, and that is not expected to give arise to consciousness either. Nor are nuerons and synapses likely to be sufficient for consciousness.
You know how we insert ourselves into the process of coming up with a delicious recipe? That first person perspective might be also necessary for reasoning. No computer knows the taste of mint, it must be given parameters about it. So if a computer comes up with a recipe with mint, we know it wasn’t via tasting anything ever.
A calculator doesn’t reason. A facsimile of something we have no idea about its role in consciousness has the same outlook as the calculator.
You’re right that my argument depends upon there being a great physical distinction between brains and H100s or enough water flowing through troughs.
But since we knew properties of wings were major comments to flight dating back to beyond the myths of Pegasus or Icarus, we rightly connected the similarities in the flight case.
Yet while we have studied neurons and know the brain is apart of consciousness, we don’t know their role in consciousness like the wing’s for flight.
If you got a bunch if daisy chained brains and that started doing what LLMs do, I’d change my tune—because the physical substrates are now similar enough. Focusing on neurons, and their facsimilized abstractions, may be like thinking flight depending upon the local cellular structure of a wing, rather than the overall capability to generate lift, or any other false correlation.
Just because an LLM and a brain get to the same answer, doesn’t mean they got there the same way.
Because we know practically nothing about brains so comparing them to LLMs is useless and nature is so complex that we're constantly discovering signs of hubris in human research.
See C-sections versus natural birth. Formula versus mother's milk. Etc.
I think you'd benefit from reading Helen Keller's autobigoraphy "the world i live in", you might reach the same conclusions I did, this being that perhaps conciousness is flavoured by our unique way of experiencing our world, but not strictly neccesary for conciousness of some kind or another to form. I beleive conciousness to be a tool a sufficently complex neural network will develop in order for it to achieve whatever objective it has been given to optimize for.
Taking a different tack from others in this thread. I don't think you can say that a TI-83 is not reasoning if it is doing calculations. Certainly it is not aware of any concepts of numbers and has no meaningful sense of the operation, but those are attributes of sentience, not reasoning. The reasoning ability of a calculator is extremely limited but what make those capabilities that it does have, non reasoning.
What non-sentience based property do you think something should have to be considered reasoning. Do you consider sentience and reasoning to be one and the same? If not then you should be able to indicate what distinguishes one from the other.
I doubt anyone here is arguing that chatGPT is sentient, yet plenty accept that it can reason to some extent.
>Do you consider sentience and reasoning to be one and the same?
No, but I think they share some similarities. You can be sentient without doing any reasoning, just through experience, there's probably a lot of simple life forms in that category. Where they overlap I think, is in that they require a degree of reflection. Reasoning I'd say is the capacity to distinguish between truth and falsehoods, to have mental content of the object you're reasoning about and as a consequence have a notion of understanding and an interior or subjective view.
The distinction I'd make is that calculation or memorization is not reasoning at all. My TI-83 or Stockfish can calculate math or chess but they have no notion of math or chess, they're basically Chinese rooms, they just perform mechanical operations. They can appear as if they reason, even a chess engine purely looking up results in a table base and with very simplistic brute force can play very strong chess but it doesn't know anything about chess. And with the LLMs you need to be careful because the "large" part does a lot of work. They often can sound like they reason but when they have to explain their reasoning they'll start to make up obvious falsehoods or contradictions. A good benchmark if something can reason is probably if it can.. reason about its reasoning coherently.
I do think the very new chain-of-thought models are more of a step into that direction, the further you get away from relying on data the more likely you're building something that reasons but we're probably very early into systems like that.
You say they are distinguishable. How would you experimentally distinguish two systems, one of which "goes through personal experience" and therefore is doing "real reasoning", vs one which is "sifting through tokens and associating statistical parameters"? Can you define a way to discriminate between these two situations?
I am getting two contradictory but plausible seeming replies when I ask about a certain set being the same when adding 1 to every value in the set, asked on how I ask the question.
What led you to beleive that mathematics is a good tool for evaluating an LLM? It is a thing they currently dont do well, since it is wildly out of domain of their training corpus - down the very way we structure information for an LLM to ingest. If we start doing the same for humans, most humans are in deep trouble.
Well I am studying mathematics, and I use the LLM to help me learn.
They aren't terrible, and they have all of arXiv to train on. Terrence Tao is doing some cool stuff with it - the idea will be an LLM to generate Lean proofs.
And I can assure you when I start to talk about these topics with the average human person that doesn't know the material, they just laugh at me. Even my wife who has a PhD in physics.
Here's some cool math I learned from a regular book, not an LLM:
I don't give a rat's ass about whether or not AI reasoning is "real" or a "mimicry". I care if machines are going to displace my economic value as a human-based general intelligence.
If a synthetic "mimicry" can displace human thinking, we've got serious problems, regardless of whether or not you believe that it's "real".
fair, but "logically consistent thoughts" is a subject of deep investigation starting from the early euclidean geometry to the modern godel's theorems.
ie, that logically consistent thinking starts from symbolization, axioms, proof procedures, world models. otherwise, you end up with persuasive words.
You just ruled out 99% of humans from having reasoning capabilities.
The beautiful thing about reasoning models is that there is no need to overcomplicate it with all the things you've mentioned, you can literally read the model's reasoning and decide for yourself if it's bullshit or not.
That's sort of arrogant, Most of that 99 (if that many) % could learn if inspired to and provided resources. And does use reasoning and instinct in day-to-day life even if it's as simple as "I'll take go shopping before I take my car to the shop so I have the groceries" or "hide this money in a new place so my husband doesn't drink it away". Models will get better over time, and yes humans only use models too.
Humans rely in cues to tell when each other is fabricating or lying. Machines don't have those cues, and fabricate their reasoning too. So we have a complicatedly difficult time trusting them.
>You just ruled out 99% of humans from having reasoning capabilities.
After a conversation with humans I think you'd agree 1% of them being able to reason deeply is a vast overestimation.
A good example to see how little people can reason is the following classic:
> Given the following premises derive a conclusion about your poems:
> 1) No interesting poems are unpopular among people of real taste.
> 2) No modern poetry is free from affectation.
> 3) All your poems are on the subject of soap bubbles.
> 4) No affected poetry is popular among people of taste.
> 5) Only a modern poem would be on the subject of soap bubbles.
The average person on the street won't even know where to start, the average philosophy student will fuck up the translation to first order logic, and a logic professor would need a proof assistant to get it right consistently.
Meanwhile o3-mini in 10 seconds:
We can derive a conclusion about your poems by following the logical implications of the given premises. Let’s rephrase each premise into a more formal form:
Premise 1: No interesting poems are unpopular among people of real taste.
This can be reworded as:
If a poem is interesting, then it is popular among people of real taste.
Premise 2: No modern poetry is free from affectation.
This tells us:
If a poem is modern, then it is affected (i.e., it shows affectation).
Premise 3: All your poems are on the subject of soap bubbles.
In other words:
Every one of your poems is about soap bubbles.
Premise 4: No affected poetry is popular among people of taste.
This implies:
If a poem is affected, then it is not popular among people of taste.
Premise 5: Only a modern poem would be on the subject of soap bubbles.
This means:
If a poem is about soap bubbles, then it is modern.
Now, let’s connect the dots step by step:
From Premise 3 and Premise 5:
All your poems are on the subject of soap bubbles.
Only modern poems can be about soap bubbles.
Conclusion: All your poems are modern.
From the conclusion above and Premise 2:
Since your poems are modern, and all modern poems are affected,
Conclusion: All your poems are affected.
From the conclusion above and Premise 4:
Since your poems are affected, and no affected poem is popular among people of taste,
Conclusion: Your poems are not popular among people of taste.
From Premise 1:
If a poem is interesting, it must be popular among people of taste.
Since your poems are not popular among people of taste (from step 3), it follows that:
Conclusion: Your poems cannot be interesting.
Final Conclusion:
Your poems are not interesting.
Thus, by logically combining the premises, we conclude that your poems are not interesting.
I could trace through that example quite quickly and I'm not an expert in logic, so I think you might be exaggerating some statements about difficulty here.
Except, human mimicry of "reasoning" is usually applied in service of justifying an emotional feeling, arguably even less reliable than the non-feeling machine.
this is the question that the greeks wrestled with over 2000 years ago. at the time there were the sophists (modern llm equivalents) that could speak persuasively like a politician.
over time this question has been debated by philosophers, scientists, and anyone who wanted to have better cognition in general.
Because we know what LLM's do. We know how they produce output. It's just good enough at mimicking human text/speech that people are mystified and stupified by it. But I disagree that "reasoning" is so poorly defined that we're unable to say an LLM doesn't do it. It doesn't need to be a perfect or complete definition. Where there is fuzziness and uncertainty is with humans. We still don't really know how the human brain works, how human consciousness and cognition works. But we can pretty confidently say that an LLM does not reason or think.
Now if it quacks like a duck in 95% of cases, who cares if it's not really a duck? But Google still claims that water isn't frozen at 32 degrees Fahrenheit, so I don't think we're there yet.
I think the third worst part of the GenAI hype era is that every other CS grad now thinks not only is a humanities/liberal arts degree meaningless but now also they're pretty sure they have a handle on the human condition and neurology enough to make judgment calls on what's sentient. If people with those backgrounds ever attempted to broach software development topics they'd be met with disgust by the same people.
Somehow it always seems to end up at eugenics and white supremacy for those people.
math arose firstly as a language and formalism in which statements could be made with no room for doubt. the sciences took it further and said that not only should the statements be free of doubt, but also that they should be testable in the real world via well defined actions which anyone could carry out. all of this has given us the gadgets we use today.
llm, meanwhile, is putting out plausible tokens which is consistent with its training set.