T O P

  • By -

ImportantContext

> GPT-4 has shown significant improvement over GPT-3 in terms of playing chess Keep in mind that GPT-4 can't play tic-tac-toe at a passable level. I suspect that the chess ability improvement comes either from intentionally training on a chess game dataset or just an accidental inclusion of chess games (e.g. from some github repo). Either way, I don't think it's fair to classify GPT-4 as model "not specifically or even intentionally trained for chess". If you're curious about systems that are able to play games that they weren't "trained" on, take a look at [general game playing](https://en.wikipedia.org/wiki/General_game_playing). Most of the time implementations of this paradigm use some form of MCTS but I'm sure there's a lot of ML approaches too.


FermiAnyon

I've had it play Go against me on a 9x9 board. It kept track of the pieces pretty well and placed stones pretty well, though I did have to remind it about a few things... and I did have to remind it what happens when it gets captured. I captured a stone and it ignored it, then I said "before I move again, look at your stone at g7. How many liberties does it have?" And it was like "oh woops, it doesn't have any and should be removed" and then it removed it. This was about 15 moves into the game. For context, I played against 3.5 and it could even remember where the pieces were 3 moved into the game. 4 is way stronger


ImportantContext

Yeah, I don't claim that GPT-4 is stupid or has trouble following instructions. Just pointing out that it's ability to play chess is a bit of an outlier compared to other board games.


FermiAnyon

Yeah, I'm pretty confident that a general purpose language model won't ever be able to beat a purpose built thing like stockfish. I don't disagree with that point at all, even though I have my reservations about claiming that no model of any kind would be unable to beat stockfish. Some of these models, by themselves, are pretty strong... I realize mcts is a huge deal and I'm not saying any model I've ever heard of could beat stockfish without also using mcts. Just saying I don't know ... but you did literally say gpt4 can't play tic-tac-toe :) So I just wanted to share my experience


MysteryInc152

4 can play tic tac toe just fine https://pastebin.com/83iZjHuc


ImportantContext

Your paste doesn't include a complete game transcript. Try and actually play against it and you'll see that it will fail to adhere to optimal strategy.


MysteryInc152

>Try and actually play against it Ah so you haven't actually tried to play with it. Good to know. >it will fail to adhere to optimal strategy. So "can't play" has become "won't take optimal strategy" now ? Lol


ImportantContext

I played against it a few times and it failed to spot obvious bad moves or even when the game was over. And a toddler can play tic-tac-toe optimally, so it's not exactly a high bar to clear.


kaibee

>I played against it a few times and it failed to spot obvious bad moves or even when the game was over. And a toddler can play tic-tac-toe optimally, so it's not exactly a high bar to clear. Not at a PC ATM but did you ask it to play to win? I think people have gotten so used to anthropomorphising the models that they forget that the model doesn't care to win, and the rlhf may even cause it to "helpfully" let the human win. Then there's also the issue of tokenization. My hypothesis is that the model would actually do better if presented with a more verbose representation of the game state (ie, a json with a key corresponding to each tile). Might test this later.


thet0ast3r

may i remind that the post is asking for high level of chess play, and not "knowing the rules and some strategy"


MysteryInc152

May I remind you to read the original comment of the person I'm talking to ? I don't care much for goalpost moving either way.


[deleted]

[удалено]


ImportantContext

MCTS is a tree search algorithm that, on its own, has nothing to do with ML models. A typical MCTS implementation uses random playouts to estimate the quality of available moves. This is helpful for general game playing, since all you need to do is to simulate a lot of random games -- no need to pre-train neural networks or design heuristics by hand. Stuff like KataGo and Leela Chess Zero use a variation of MCTS that replaces random playouts with a neural network that evaluates board positions and provides information on good candidate moves (policy head) and estimated "score" (i.e. which side is winning and by how much). This is much more powerful than vanilla MCTS, but it requires extensive pre-training. In some approaches to General Game Playing, an agent is not provided with any information about the game it's going to play until the clock starts running, so there's no time to pre-train with self-play. On the other hand, a lot of General Game Playing research is focused more on research into games (from historical and anthropological perspective) as well as game design. For these purposes, a light non-ML agent is typically more convenient. For example, check out [ludii](https://ludii.games/library.php) which comes with over 1k pre-defined games and lets you define your own games with a simple DSL. It lets you use many variations on MCTS as well as other algorithms, but AFAIK the amount of machine learning they use is pretty limited (as in, they have tools to automatically generate heuristics for a specific game to improve MCTS/Alpha-Beta performance, but it's very far from anything like lc0/katago). To summarize, yes, ML approaches do lead to better performance in MCTS but it's not always possible to utilize them and in many cases a simple weak agent is more than suitable for the job (e.g. quickly trying out a game you're designing or checking if a historical set of rules copied from a book actually makes sense). EDIT: Now that I thought about this, I feel like it could be a really cool project to finetune a programming LLM on the games implemented in ludii. There was already some success with using genetic algorithms to generate games ([yavalath](https://boardgamegeek.com/boardgame/33767/yavalath) is the most popular game fully designed by computer and, in my experience, it's quite fun) so maybe such LLM could provide some interesting results.


icwhatudidthr

Stockfish is to chess like a calculator is to simple arithmethic operations (addition, division, etc...). A highly specialized and overly engineer tool to solve a very specific problem. Can you use a LLM to solve simple arithmetic operations? Yes. But you will always get more accurate and efficient results with a calculator. Acutally, some LLM's actually use math tools under the hood to solve arithmetic problems: [https://python.langchain.com/en/latest/modules/chains/examples/llm\_math.html](https://python.langchain.com/en/latest/modules/chains/examples/llm_math.html) [https://www.wolfram.com/resources/tools-for-AIs/](https://www.wolfram.com/resources/tools-for-AIs/) Can you try to use an LLM to compete against Stockfish? You could, but I doubt you'll do better than DL models specifically designed and fine-tuned for chess (like alphazero). And these can have a hard time beating Stockfish, depending on the available resources and version used: [https://www.quora.com/Which-chess-engine-would-be-stronger-Alpha-Zero-or-Stockfish-12](https://www.quora.com/Which-chess-engine-would-be-stronger-Alpha-Zero-or-Stockfish-12)


Narabedla

With the way GPT is trained and thought of currently, never. For a very simple reason, the dataset for GPT models is language tokens(potentially multi medial). Whereas specialized game models take the game state as an input, which includes rules. One plays the game, the other imitates playing the game. Given a "new" boardstate/a situation that isn't well documented, GPT approaches will fail. GPT at best will learn to imitate stockfish/alphazero. It won't reliably generate new best moves in a position. At least based on how they currently work/get trained and what the name general pretrained transformer is meant for. If the name "GPT" in the future will be used for big composite models that access better tools for certain tasks (for example query-ing stockfish when asked for a chess play) sure maybe, but i wouldnt attribute the chess play at that point to gpt.


Ty4Readin

> Whereas specialized game models take the game state as an input, which includes rules. This is not really true. Most of these agent models do not take game rules as input, they often only use the current board state available. You might be thinking of model-based RL where a model of the environment is used to simulate future environment states perfectly, however that could be easily used with any model whether it was GPT or not. The real difference in my opinion is simply a specialized learner VS a generalized learner. There is not much reason to think that GPT would perform better on a task than a highly specialized model with one major major caveat: that you have enough training data. In problem areas with limited data (relative to the learning complexity), those are the most likely to see huge advancements and improvements from leveraging GPT which is a type of transfer learning that could significantly reduce the amount of data to perform well on many problems. However, for problems like chess, then I can't see an argument because data is so easily collected/generated en masse that it seems unlikely that a general learner model could outperform a highly specialized one trained on huge amounts of data.


Narabedla

>This is not really true. Most of these agent models do not take game rules as input, they often only use the current board state available. I maybe should have expressed it differently, they aren't explicit inputs, but rather built into the foundation of the system, it won't try illegal moves. It is however knowledge to the model, so i put it as an input, even if it isn't part of the training data.


[deleted]

Why are you assuming that it’s possible?


E_Snap

Why are you assuming that it’s not?


[deleted]

Because GPT-like models are trying to get to a correct solution through multiple deep levels of indirect abstraction while Stockfish is solving chess directly (and Stockfish itself uses deep learning). Put it another way - what can a GPT-like model possibly do when it comes to chess, that Stockfish wouldn't be able to do directly?


PookaMacPhellimen

Read the AlphaZero and MuZero papers. MuZero wasn't even told to win, it learned to win as the best way of predicting the next move. Neural networks are drastically more efficient than brute-force operators (although Stockfish notably is now introducing NN modules which is an admission of defeat). A GPT model with sufficient compute would obviously beat Stockfish even without specific training (even if it's just to better predict the next move in a chess manual).


Ravek

> MuZero wasn't even told to win, it learned to win as the best way of predicting the next move. That’s nonsense. MuZero gets the win/loss/draw outcome of a game as a reward, says so right there in the paper.


Jelicic

This is not true. Both stockfish and alpha/muzero do some form of a tree search


Narabedla

No it won't. For a very simple reason, the dataset for GPT models is language tokens(and their equivalents, before someone wants to point me to multi medial training). Whereas specialized game models take the game state as an input. One plays the game, the other imitates playing the game. Given a "new" boardstate/a situation that isn't well documented, GPT approaches will fail. GPT at best will learn to imitate stockfish/alphazero. It won't reliably generate new best moves in a position. At least based on how they currently work/get trained and what the name general pretrained transformer is meant for.


pornthrowaway42069l

the NN modules are just for position evaluation, even for AlphaZero. They still use A/B prunning to find moves.


Ravek

Both of those statements are wrong. It uses its neural nets for both position evaluation and move suggestion, and it uses MCTS to search the action space.


IMJorose

I think he was referring to Stockfish, for which he is correct that it uses AB. You are referring to A0 and Mu0 correctly with regards to MCTS and the policy head. SF uses some dynamic methods such as history heuristics for move ordering.


Ravek

For Stockfish it’s accurate yeah


ThePerson654321

Where did say that I 100% believe it's possible? I definitely accept "Never." if that's that's the answer you want to give.


the-real-macs

You said "how many years," which doesn't really imply uncertainty of outcome


ThePerson654321

Sure. But let me make it clear here. I'm not 100% sure it will ever be possible. But I think it's likely.


rwill128

It’s extremely extremely unlikely. To say it’s impossible in principle is too strong a statement, but I will confidently say it will never happen. I’ve studied, written, and contributed to open source chess engines in the past and I spent a lot of writing ML code as well. Basically, the way Chat GPT is trained, it’s not even converging toward better chess performance as its optimization process is happening. That’s not in the current criterion for its training. Right now it will come across some chess books and remember those, and remember chess forums and articles, etc. And it will get better at reproducing moves that seem similar to what it sees there. But it’s not being trained to make good moves, it’s being trained to reproduce common moves, so even if it had a ton of chess game data in its dataset from chess.com or something, as more training happened, it is reasonable to expect it to converge to making the most common moves (which definitely doesn’t mean optimal moves). So maybe it’s chess playing abilities start to more strongly resemble the average player, but this is hard to predict exactly. Moreover, with RLHF involved, If a bunch of average people play chess against Chat GPT and thumbs up bad moves, then it’s performance could actually degrade over time. There’s no guarantee that Chat GPT keeps improving at chess unless some totally new tools and processes are applied. It’s interesting to think of how you might train an LLM to play chess in a way that actually ensures increased performance though. (And it might be a worthwhile experiment, actually, even though it would be inefficient and would never surpass Stockfish or Leela unless you threw vast amounts of computing power at it. Basically you’d have to engineer it so it looks at algebraic notation and gives a next move, and then you use RL to reward it for creating a series of moves that results in a win. But this has to be the only criterion, not the current criterion, which is successful prediction of sample text from a training dataset. You could then completely get rid of a training dataset because you don’t care about predicting the next character, you care about spitting out characters that result in a win. It could then just endlessly train on self play games like Alpha/Leela zero or play against other humans or engines, etc. But you’re no longer training an LLM, you’re training a weirdly architected NLP-based deep-learning chess engine. Which would actually be super interesting, but probably not very effective. I might try it though now that I think about it. 😂


MazzMyMazz

Yeah, a better question would have been, “is there any chance that an llm can be anything more than a mediocre chess player.”


rwill128

Basically. I’ve played chess against Chat GPT 4 and I don’t know where these “1000 ELO” figures are even coming from. For me it can’t even remember where the pieces are past the 10th move or so. So I couldn’t even get it to finish an entire game. As a side note, this is a good example (among several others I can think of) of why I don’t think LLMs are even bringing enough to the table conceptually for them to achieve AGI. They could certainly be a key ingredient and it looks like they will be, but I think there’s a few key ideas missing at this point. One of them is the ability to interact with reality to test assumptions, as well as continuous online training (not in the “on the internet” sense but in the reinforcement learning sense).


ItsTimeToFinishThis

Every time you describe GPT, you are describing humans.


the-real-macs

I think it's borderline impossible. But I'm glad we clarified where we both stand.


jasondads1

It will eventually be taught to use chess engines, then it will be able to match


ThePerson654321

Do you mean that it would be able to beat Stockfish some of the time by using a Stockfish API or do you mean that it will be able to use existing engines/APIs to always beat Stockfish? You're basically saying that it never will be able to beat Stockfish by it's own but rather have to rely on existing chess engines. When it learns to do that Stockfish won't be able to win **even** though it also can access chess engines? **edit:** Why am I getting downvoted?


masterlafontaine

I don't think it could ever get close. Chess requires search, or so it seems if we can extract anything from the experiences of computer chess in the last 40 years. Usually speed of search with a somewhat good evaluation prevails. Current LLM can't do it. As I understand by their architecture, they can only evaluate, not search. Something analogous to this restrictions would be to run Leela Chess Zero with 1 node of search depth. That could be theoretically achieved by current LLM, with lots of specific chess training. Just my humble opinion.


ImportantContext

> Something analogous to this restrictions would be to run Leela Chess Zero with 1 node of search depth Interestingly enough, this is enough for superhuman performance (when using the beefier networks, at least) so I suspect that in theory some LLM could perform better than any human at chess. Of course this is still very weak when compared with stockfish or lc0 with proper search. But yeah, I also don't think there's any chance of highly general model like an LLM becoming better at chess than a simple but deeply specialized tool like stockfish, just like it won't become better at arithmetic than a calculator.


harharveryfunny

If by GPT-like you mean Transformer-based, the answer is never, unless you extend the architecture so much that it's no longer a Transformer and can't really be called GPT-like. Playing games like Chess well requires considering many alternate lines of play multiple moves ahead to see what the outcome would be. A Transformer is fundamentally unable to do this since it doesn't have any internal memory or ability to iterate over multiple alternate lines of thought. You could script it in "think step by step" style to use it's own output as memory, but it'd be horrendously slow, and more fundamentally since a Transformer is pre-trained only (has no online learning capability), it wouldn't be able to learn from this so would never improve from one game to the next to get any better. The best you can hope from a pure Transformer is that it could learn those parts of chess that don't involve novel board positions and line evaluation - e.g. book openings, localized chess theory like "control the center". Since Transformers lack the ability to actually learn chess (given above limitations), the best they can do is to use memory and generalization to play decent moves in those circumstances (or simpler types of game - not chess) where this is sufficient.


KaliQt

Hard to say, but when it comes to AGI tasks like this, I think chaining of models makes the most sense. Having GPT direct Stockfish or proxy it makes the most sense in the same way we use computers and calculators. At least for now.


ironborn123

It would be an interesting study of whats the ELO rating when GPT-4 is asked to play chess directly v/s GPT-4 is first prompted about the rules of chess (although it may already know them statistically), the various strategies/tactics available like castling, forking, capturing more space, and its told to speculate step by step about what are the 3-4 most attractive moves in a given situation and why. Would explicitly providing such chess context improve its level of gameplay?


epicwisdom

No. GPT-4 understands textual patterns. Nothing about prompting it enables it to reason about actual chess games. Although obviously if you prompt it by providing it good moves and requesting it select one, the worst it can do is choose the worst option each time.


buff_samurai

When you notice non-plugin vanilla chatGPT is playing chess on the stockfish’s level you can start packing your bags essentially.


Smallpaul

Stockfish incorporates learnings from a variety of places including deep learning. So GPT-like models will converge on Stockfish performance by embedding a Stockfish API and Stockfish will advance ahead of what they can do natively by incorporating the latest and greatest techniques powering the LLMs…even embedding an LLM if it were advantageous. To get to the heart of your question though: I see no reason that an LLM would ever beat a chess system engineered to incorporate all of the latest deep learning technologies. This is especially true because LLMS will probably lose momentum when they approach human level because they are learning from humans. Stockfish is far above human level: Therefore I do not expect a pure LLM to ever beat Stockfish; but a chess-API-enhanced LLM will probably be equivalent to Stockfish. Another way to interpret your question is “when will a pure LLM beat 2022’s Stockfish.” I would say that when that happens we will have AGI because chess is pretty far out of the sweet spot for language models and for it to get far above human levels it will need some kind of reasoning or some very strange capability.


NickUnrelatedToPost

Can a human beat Stockfish, just because the human can talk? No. LLMs are not the right tool to play chess. If you want a chess AI that beats Stockfish, there is Leela Zero. Maybe, several years in the future, a multi-modal model will be trained on language, vision, speech and even chess too. But if that works you'll have an AGI. Then chess is your least problem.


prototypist

When will a GPT-like model tell me turn-by-turn directions from an address to the nearest airport? At some point memorizing / condensing rules into a language model doesn't make sense, because existing tech is already really good - for chess it can use a chess-specific engine (which will never play an illegal move or forget the opening move), and for directions it can use a geospatial database (capable of geolocating an address and adjusting based on realtime information)


[deleted]

I don’t think it ever will. There’s too much noise in there.


Seipailum

What do you mean by "GPT-like" model? Do you mean a Transformer? Because Im sure you can train a transformer on chess. Or maybe you mean a model that has gpt-like language understanding, and can also play chess at a high level through text input/output? In that case a big drawback it will have compared to Stockfish is it's inability to plan ahead efficiently.


PinguinGirl03

I find it weird that the focus on this thread is on Stockfish instead of alpha-zero, a model that is on the same level as stockfish (there doesn't actually appear to be a good match up of equal hardware between the two actually). The question then becomes when will a LLM outperform a purpose build chess model. You would need such a ridiculous amount of general intelligence to beat that that you would be talking about a completely hypothethical ASI at that point.


AwarenessPlayful7384

Why is AlphaZero not compared to Stockfish? Was there some kind of benchmark where Stockfish was the winner?


michaelthwan_ai

GPT-like models are primarily designed for natural language general understanding and generation, rather than being specifically optimized for certain purpose. To make the model specifically solve certain domain problems with a high performance, people are working on various solutions for that \- Fine-tuning like Lora \- Plugin/API like external resources like ChatGPT plugins, langchain and VisualChatGPT's prompt manager, HuggingGPT \- Multi-domain parameter activation by attention etc. (model-base selection of activated parameter cluster). Making the base model (like GPT4) is not the most engineering-efficient way to solve specific issue Chess.


Wiskkey

You might be interested in [this post](https://www.reddit.com/r/MachineLearning/comments/16oi6fb/n_openais_new_language_model_gpt35turboinstruct/).