Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

He literally used the same prompt as the article.

Claim: "ChatGPT's Chess Elo is 1400"

Reality: ChatGPT gives illegal moves (this happened to article author too), something a 1400 ranked player would never do

Result: ChatGPT's rank is not 1400.



No, the author of the article specifically says that the entire move sequence should be supplied to chatGPT each time, not simply the next move. Be very careful when "disproving" an experiment with squinted eyes.


I'm not really sure what to say here. Both the parent commenter and the author of the article had issues with ChatGPT supplying illegal moves. Both methods resulted in this. It sort of doesn't matter how we're trying to establish that it's a 1400 level player, there's no defined correct way to do this. Regardless of method we've disproven it's a 1400 level player due to these illegal moves.


The #1 misconception when working with large language models is thinking that a capability is a property of the model, rather than the model + input. It may be simultaneously true that ChatGPT has an elo of 100 when given a conversational message and an elo of 1400 when given an optimized message (e.g., strings that resemble chess games, with many examples present in the conversation).

Understanding this concept is crucial for getting good results out of large language models.


> Regardless of method we've disproven it's a 1400 level player due to these illegal moves.

Explain your thought process here further if you don't mind.


I think his point is that 1400 level players don't make illegal moves, therefore ChatGPT is not playing at the level of a 1400 level player.


Think blindfolded 1400 players, which is what this effectively is, would make illegal moves.

But even if it doesn't play like human 1400 players, if it can get to a 1400 elo while resigning games it makes illegal moves on, that seems 1400 level to me. And i bet that some 1400s do occasionally make illegal moves (missing pins) while playing otb


This isn't really an apt metaphor. Firstly because higher level blindfolded players, when trained to play with a blindfold, also virtually never make mistakes. Secondly because a computer has permanent concrete state management (compared to humans) and can, without error, keep a perfect representation of a chess if it chooses to do so.


1400 FIDE !=. high level blindfolded player.


Personally I think the illegal moves are irreverent, the fact that it doesn't play exactly like a typical 1400 doesn't mean it can't have a 1400 rating. Rating is purely determined by wins and losses against opponents, it doesn't matter if you lose a game by checkmate, resignation, or playing an illegal move.

That's not to say ChatGPT can play at 1400, just that that playing in an odd way doesn't determine its rating.


This is like saying I play at a 2900 level if you just ignore all the times I lose.


No it's not, we're not ignoring losses or illegal moves at all, they are counted as losses and that's how you arrive at 1400.

It's a (theoretically) 1400 player which plays significantly better then 1400 when it knows the lines, but makes bad or illegal moves when it doesn't, and that play averages out to be around your typical 1400 player. Functionally is just what a 1400 player already is, but with higher extremes and lower lows.


The article does not ignore the losses. In fact, it used a rule stricter than FIDE rules to trigger losses on illegal moves.


The author said ChatGPT gives illegal moves. So, a quirky sort of 'grandmaster'. He considered illegal moves to be a resignation. Maybe you need to tell ChatGPT that the alternatives are to win via legal moves, and if it is not possible to do so, to resign? Does that fix it?


> something a 1400 ranked player would never do

The fact that rules and articles exist describing what to do if you or your opponent makes an illegal move indicates this is not the case.

Humans are also... human. They make mistakes. It may not happen often at 1400, but to say that it'll never happen is preposterous.


I can’t remember the last time I played an illegal move tbf, and I’ve played 7 games of chess this morning already to give you an idea of total games played


You have never made an illegal move, ever?

The bar isn’t “I didn’t make an illegal move this morning” it’s “something a 1400 ranked player would never do”.

My entire point is that it happens. Not often, but also not “never”.


This argument is pretty flimsy. ChatGPT makes illegal moves frequently. In all my years of playing competitive chess (from 1000 to 2200), I have never seen an illegal move. I'm sure it has happened to someone, but it's extremely rare. ChatGPT does it all the time. No one is arguing that humans never make illegal moves; they're arguing that ChatGPT makes illegal moves at a significantly higher rate than a 1400 player does (therefore ChatGPT does not have a 1400 rating).

Edit: Without reading everything again, I'll assume someone said "never." They're probably assuming the reader understands that "never" really means "with an infinitesimal probability," since we're talking about humans. If you're trying to argue that "some 1400 player has made an illegal move at some point," then I agree with that statement, and I also think it's irrelevant since the frequency of illegal moves made by ChatGPT compared to the frequency of illegal moves made by a 1400 rated player is many orders of magnitudes higher.


> No one is arguing that humans never make illegal moves

> something a 1400 ranked player would never do

> fine, fair, "never" was too much.

I mean, yes they were and they said as much after I called them out on it. But go off on how nobody is arguing the literal thing that was being argued.

It's not like messages are threaded or something, and read top-down. You would have 100% had to read the comment I replied to first.


You have twice removed the substance of an argument and responded to an irrelevant nitpick. Here's what the OP said:

> He literally used the same prompt as the article. > Claim: "ChatGPT's Chess Elo is 1400"

> Reality: ChatGPT gives illegal moves (this happened to article author too),

> something a 1400 ranked player would never do

> Result: ChatGPT's rank is not 1400.

This is a completely fair argument that makes perfect sense to anyone with knowledge of competitive chess. I have never seen a 1400 make an illegal move. He probably hasn't either. Your point is literally correct in the sense that at some point in history a 1400 rated player has made an illegal move, but it completely misses the point of his argument: ChatGPT makes illegal moves at such an astronomically high rate that it wouldn't even be allowed to even play competitively, hence it cannot be accurately assessed at 1400 rating.

Imagine you made a bot that spewed random letters and said "My bot writes English as well as a native speaker, so long as you remove all of the letters that don't make sense." A native English speaker says, "You can't say the bot speaks English as well as a native speaker, since a native speaker would never write all those random letters." You would be correct in pointing out that sometimes native speakers make mistakes, but you would also be entirely missing the point. That's what's happening here.


> Ah yes, of course, just because you never saw it means it never happens. That's definitely why rules exist around this specific thing happening. Because it never happens. Totally.

You seem to have missed the part where I said multiple times that a 1400 has definitely made illegal moves.

> In fact, it's so rare that in order to forefeit a game, you have to do it twice. But it never happens, ever, because pattrn has never seen it. Case closed everyone.

I actually said the exact opposite. You're responding to an argument I didn't make.

> I made no judgement on what ChatGPT can and can't do. I pointed out an extreme. Which the commenter agreed was an extreme. The rest of your comment is completely irrelevant but congrats on getting tilted over something that literally doesn't concern you. Next time, just save us both the time and effort and don't bother butting in with irrelevant opinions. Especially if you couldn't even bother to read what was already said.

The commenter's throwaway account never agreed it was an extreme. I agreed it was an extreme, but also that disproving that one extreme does nothing to contradict his argument. Yet again you aren't responding to the argument.

This entire exchange is baffling. You seem to be missing the point for a third time, and now you're misrepresenting what I said. Welcome to the internet, I guess.


> The commenter's throwaway account never agreed it was an extreme.

> fine, fair, "never" was too much.

This is the second time I've had to do this. Do you just pretend things weren't said or do you actually have trouble reading the comments that have been here for hours? You make these grand assertions which are disproven by... reading the things that are directly above your comment.

> This entire exchange is baffling.

Yeah your inability to read comments multiple times in a row is extremely baffling.

As I said before:

> Next time, just save us both the time and effort and don't bother butting in with irrelevant opinions. Especially if you couldn't even bother to read what was already said.


> The commenter's throwaway account never agreed it was an extreme.

I did, two hours ago, 6 minutes after your comment

https://news.ycombinator.com/item?id=35201830


Thanks! I appreciate it.


> I have never seen a 1400 make an illegal move.

Ah yes, of course, just because you never saw it means it never happens. That's definitely why rules exist around this specific thing happening. Because it never happens. Totally.

In fact, it's so rare that in order to forefeit a game, you have to do it twice. But it never happens, ever, because pattrn has never seen it. Case closed everyone.

I made no judgement on what ChatGPT can and can't do. I pointed out an extreme. Which the commenter agreed was an extreme. The rest of your comment is completely irrelevant but congrats on getting tilted over something that literally doesn't concern you. Next time, just save us both the time and effort and don't bother butting in with irrelevant opinions. Especially if you couldn't even bother to read what was already said.


No I definitely have, it’s just so rare I can’t remember when I last did it. I do remember playing one in a blitz tournament 20 years ago! But if this is the first game they played, or if it happens in 1/10 matches, that’s wild


A broken clock is correct two times a day. But my broken clock isn't 1400 player although it might seem to be.


Does that somehow prove the assertion of "something a 1400 ranked player would never do"?

Because all I'm hearing is talk about ChatGPT's abilities as a reply to me calling out an extreme statement as being extreme. Something the parent comment even admitted as being overly black and white.


Prove to me your clock is broken, I think it's just telling the future.


I asked the clock over and over, and after 5 hours it gave the right time, proof the clock can learn!


I read an article about a pro player who castled twice in a game and my son hates castling so I make a point of castling twice as often as I can to tease him and attempting other illegal moves as a joke but he never ends the game because of it.

If I was playing that monstrosity though I would play something crazy that is far out of the opening book and count on it making an illegal move.


I trivially made it make an illegal move it my very first game, on the third move, just by deliberately playing weird moves:

> You are a chess grandmaster playing as black and your goal is to win in as few moves as possible. I will give you the move sequence, and you will return your next move. No explanation needed.

1. b4 d5 2. b5 a6 3. b6

> bxc6

No, it's ridiculous to say "oh, a blindfolded human might sometimes make a mistake." No, this is trivially easy to make it make a mistake. It has no internal chess model at all, it's just read enough chess games to be able to copy common patterns.


fine, fair, "never" was too much. posting link to this comment to not repeat same discussion twice

https://news.ycombinator.com/item?id=35201037




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: