FilterHN

tzs

8 months ago

[-]

OT: what's the state of the art in non-GM level computer chess?

Say I want to play chess with an opponent that is at about the same skill level as me, or perhaps I want to play with an opponent about 100 rating points above me for training.

Most engines let you dumb them down by cutting search depth, but that usually doesn't work well. Sure, you end up beating them about half the time if you cut the search down enough but it generally feels like they were still outplaying you for much of the game and you won because they made one or two blunders.

What I want is a computer opponent that plays at a level of my choosing but plays a game that feels like that of a typical human player of that level.

Are there such engines?

https://www.maiachess.com/

rococode

8 months ago

[-]

Maia does this reasonably well! You can play against it on Lichess. I have gotten a few "feels like a human" moments when playing against it - for example, getting it to fall into a trap that could trick a human but would easily be seen by a traditional search algorithm. It's not adjustable but there are a few different versions with different ratings (although it's not a very wide range).

https://lichess.org/@/maia1

plaguuuuuu

8 months ago

[-]

Piggy-backing off this - does anyone know of a quick way to evaluate the maia weights from python or js for a single board state? I'm trying to hack something together with my own search func intended for human play and I can't quite figure it out from the cpp in Lc0.

mcyc

8 months ago

[-]

A bit late, but I wrote up an example inference script here: https://gist.github.com/mcognetta/7a98e50859664b8efbb4ec094a...

It is a bit roundabout, since it involves converting maia models to onnx before loading into pytorch and some outdated versions of libraries (maia/lc0 are a little old). We were using this for transfer learning for a competition, so we needed some flexibility that we didn't know how to do quickly/easily in TF.

Hope this helps.

------------------

Personal note: given your interest in chess ai and your starcraft username, I think we would have a lot of shared interests. Feel free to reach out (info is in my profile).

hatsunearu

8 months ago

[-]

Hi! I just messaged you.

8 months ago

[-]

I built something like this. It works as long as you're not too high-rated: chessmate.ai. Once players get higher rated it is more difficult to predict their moves because you need to model their search process, not just their intuitive move choice. It's also possible to train on one player's games only so that it is more personalized.

It uses a similar approach to Maia but with a different neural network, so it had a bit better move matching performance. And on top of that it has an expectation maximization algorithm so that the bot will try to exploit your mistakes.

8 months ago

[-]

Really nice work! The tabs other than "play" don't seem to be working, but I was able to try some novelty openings and it certainly felt like it was responding with human moves. It would be great to have the ability to go back/forth moves to try out different variations.

I'm curious how you combined Stockfish with your own model - but no worries if you're keeping the secret sauce a secret. All the best to you in building out this app!

8 months ago

[-]

I'm happy you enjoyed it! There are definitely a few rough edges, yes.

Since the whole thing is executed in the browser (including the model) there aren't a ton of secrets for me to keep. Essentially it is expectation maximization: the bot tries to find the move with the highest value. What is "value"? Essentially, it is the dot product between the probability distribution coming out of the model and the centipawn evaluations from Stockfish.

In other words if the model thinks you will blunder with high probability, it will try to steer you towards making that mistake.

8 months ago

[-]

Thank you for the explanation! I completely understand the rough edges, I have some rough ideas out there myself. Would it be alright if I add a link to your site to our chess club's list of online resources?

I can also make a note of it privately and check back in with you in the future. I found it pretty remarkable that it played a human-like response to some niche openings - I actually ended up checking against Stockfish and it played different moves, which is pretty neat.

8 months ago

[-]

Sure, go ahead!

pgt

8 months ago

[-]

Hello! I built a Chess AI also named Chessmate in high school (2005) that made it into a Minecraft mod 10 years later: http://petrustheron.com/posts/chessmate.html

Java source code here: https://github.com/theronic/chessmate

ssijak

8 months ago

[-]

"Sure, you end up beating them about half the time if you cut the search down enough but it generally feels like they were still outplaying you for much of the game and you won because they made one or two blunders."

That is what winning in chess is. Minimising blunders.

Sesse__

8 months ago

[-]

“The winner of a game is the one who makes the second-to-last blunder.”

(Also this has come up in computer chess; it's more important to improve the quality of your worst moves than your average moves)

8 months ago

[-]

I believe that observation is borne out in the statistics too, but traditional chess training usually centers around finding the best, hard-to-find move in a position rather than avoiding blunders. I think it would be great if there was more blunder-avoidance training. In other words, a normal position where a blunder looks attractive but the player needs to avoid it.

LanceH

8 months ago

[-]

A long time ago I had the Fritz engine from chessbase. It had a sparring feature where if you maintained good play it would give up a tactical puzzle in the middle of the game. It could either warn you or not. If you didn't play solidly enough, you would just lose.

As far as I can tell, they got rid of this feature. It was the only computer opponent that felt real. Like it made a human mistake when put under pressure, rather than just playing like a computer and randomly deciding to play stupid.

scotty79

8 months ago

[-]

> [...] feels like they were still outplaying you for much of the game and you won because they made one or two blunders.

That's why I don't like winning in multiplyer games. Usually when you win you either feel like the opponent just played comically bad on sufficient number of occasions or that they played well but in few instances you got undully lucky and it could have gone either way. Very rarely you get the desired feeling that opponent played well but you just played a little better overall so your win is deserved. It almost always seem like it's not that you are winning but the opponent is losing instead. And none of that is about AI. Making AI that lets you win symmetrical games satisfyingly and teaches you with your losses in a satisfying manner would be a billion dollar business. I don't think it can be done without some serious psychology research.

rtlaker

8 months ago

[-]

No, not with adjustable rating. The best human-like engine is fairymax, but its Elo is estimated between 1700-2000.

WhitneyLand

8 months ago

[-]

What’s your rating, have you tried gpt4o?

It’s supposedly good up to about 1300, but aside from that the ability to prompt can make the style of play somewhat tunable for ex aggressive, defensive, etc.

wavemode

8 months ago

[-]

GPT is incapable of playing a full game of chess without making lots of illegal moves, in my experience.

Even correcting for those, it's still horrendously bad at the game. Where are you getting this 1300 number?

anamexis

8 months ago

[-]

Do you know if there are any interfaces to play against got4o? Or is it just typing in algebraic moves back and forth?

stackghost

8 months ago

[-]

It prints an ASCII board, and then yes, you type algebraic moves back and forth and it updates the board for you each turn.

espadrine

8 months ago

[-]

KataGo has a special model weights release with human-like play at various Elo: https://github.com/lightvector/KataGo/blob/master/docs/Analy...

You can see in the release notes a few screenshot examples where a particular move changes likelihood as you get to higher-level play: https://github.com/lightvector/KataGo/releases/tag/v1.15.0

6510

8 months ago

[-]

It would be hilarious if they downgraded by being more aggressive. For example: It needs to score n points worth of aggressive moves that are not the best moves. After screwing up the position by n they can go back to playing the best moves again.

Otherwise you wouldn't really be learning anything useful. You would end up with an opening vocabulary that good players would easily punish. If you play crappy gambits leading to positions you know well the better players will think highly of you.

Best way to learn is to play the hardest possible engines and just take back moves when it becomes evident you've screwed up.

svachalek

8 months ago

[-]

As a complete amateur I'd love a chess engine that can point out my mistakes, explain its own moves, and suggest better moves, in human terms. Like, this move would pin the white bishop. I don't know if this would be AI based or simply a search-based engine that scores in abstracted ways such as giving points for tactics like pins and forks, controlling the center, protecting the king, etc. (Typical chess search just scores based on piece and position and looks ahead farther than a human brain can handle.)

Psychoshy_bc1q

8 months ago

[-]

I recommend a chess club for that.

netdevnet

8 months ago

[-]

How would you even go about making a model that can simulate a human chess skillset (saying levels implies that chess skillset is a scalar value while it is more reasonable to think of it as a tree of skills where your abilities might be higher or lower depend ending on the specific skill branch)

anton-107

8 months ago

[-]

take millions of games of human players of certain rating only as your learning data?

Someone

8 months ago

[-]

In the context of this thread (“non-GM level computer chess”, which I read as also excluding International, FIDE Master, and Candidate Master (https://en.wikipedia.org/wiki/Grandmaster_(chess))), I think it’s more important to not have a good learning algorithm.

Even 10 thousand of such games may already have way more tactics than a player at the targeted level can detect and apply. If so, a learning algorithm that detects and remembers all of them already will be better than the target level.

netdevnet

8 months ago

[-]

Exactly. Level x (whatever scalar thing the user meant by that) doesn't quite work out for the reason you outlined. X Level Players have different tactics and someone that can use all of them will likely be better than most if not all those those players. I got downvoted for saying that. Maybe I didn't phrase it as well as you did

wavemode

8 months ago

[-]

Yeah but, won't it also be learning from the mistakes and missed tactics too? (Assuming its reward function is telling it to predict the human's move, rather than actually trying to win.)

WithinReason

8 months ago

[-]

condition the move on ELO while training

netdevnet

8 months ago

[-]

You are assuming that's going to be a reliable proxy, what would make you think that?

jsjohnst

8 months ago

[-]

> What I want is a computer opponent that plays at a level of my choosing but plays a game that feels like that of a typical human player of that level.

Chess.com’s NPC players feel fairly in line with what you are asking for.

PeterStuer

8 months ago

[-]

I wonder if loweing the 'temperature' level (how rigid they select the max value completion) on these types of models would achieve this? Might be tricky to tune as the outcome is likely non=linear

gcr

8 months ago

[-]

Stockfish is the classic answer, though I’m not sure how well it’s graded. Someone must have a “Stockfish strength”-to-ELO mapping.

tzs

8 months ago

[-]

It's not getting an engine to play in the right rating range that is the problem. It's getting it to play like a human would play in that rating range.

The average rating of tournament chess players in the US is around USCF 1550. I'm not sure what their FIDE rating would be. FIDE ratings are usually 50-100 points lower than USCF ratings but that's based on comparing people that have both ratings which for the most part are strong masters and above.

A human with a USCF 1550 rating will typically be mostly making moves that are suboptimal in a variety of ways: piece coordination, king safety, planning, pawn structure, development, search, and more. Basically they are around 1550 at nearly everything. There will be variations of course. A particular player might be worse at king safety and better at tactics for instance, but it will be something like they handle king safety like a 1400 and tactics like a 1700.

With a GM level engine turned down to 1550 you tend to see aspects of GM level play still in its game. If you are a 1550 playing against it it doesn't feel like you playing the kind of opponent you will play if you enter a local chess tournament and get paired with another 1450-1650 player.

It feels like you are playing someone with a totally different approach to chess than you who just happens to lose or draw to you about the same amount as a 1450-1650 human.

Out_of_Characte

8 months ago

[-]

What I find fasinating is how bad human beings are at chess. Now that we have engines we're finally able to analise every game ever played and they show us everything in chess that we're blind to. Their ability to never blunder in 1 or 3 moves is admirable, and better than most players, and to say nothing about their ability to make you play out the longest possible chain before checkmate. What I found most insulting is when I played against the best bot I could beat, it gave up its rook for free.

Scene_Cast2

8 months ago

[-]

I'm currently trying to build one, fwiw.

andrelaszlo

8 months ago

[-]

Cool! I've been wondering for s while if it wouldn't be possible to use lichess games for various ratings to make typical mistakes.

I'm also curious about if it would be possible to mimic certain playing styles. Two beginners can have the same rating but one might lose because they have a weak opening, and the other one because they mess upo the end game, for example.

Random mistakes doesn't mimic human play very well.

Scene_Cast2

8 months ago

[-]

Exactly. My eventual goal is to be able to emulate any single player with a public game history. Maybe even flag unhuman-like moves that also happen to be top stockfish moves as possible cheating.

My current chess engine already hangs its queen sometimes and walks into forks. I'm still experimenting with how to improve personalization.

og_kalu

8 months ago

[-]

GPT-3.5-turbo-instruct has a peak Elo of around 1800 (but of course can be prompted to play with less skill) and is as human like as you'll get at that level.

8 months ago

[-]

It doesn't seem that difficult to pull off - take one of the existing engines, get the top y moves, choose randomly. For each level down increase y by 1.

ajkjk

8 months ago

[-]

No, it doesn't work at all. Human mistakes are not at all like computer mistakes. Like -- blundering a piece in a 1-2 move combination will straight up never show up in the stockfish move tree, no matter what you set `y` to.

agubelu

8 months ago

[-]

It doesn't work that way. There are many positions with lots of moves that are reasonable, but many others with only 1-2 sensible moves. It would make lots of obvious blunders that an amateur human would never make.

chmod775

8 months ago

[-]

Also attention. Lower level human players are more likely to make a move close to their own/their opponent's recent move. They're focused on one area of the board.

Basic computer opponents on the other hand can make moves all over the place. They look at the board state holistically. This can be very frustrating to play against as a human who has enough problems just thinking their way through some subset of the board, but is thrown off by the computer again and again.

It's not that bad in chess at least (compared to Go), but still something worth to keep in mind if you're trying to make an AI that is fun to play against as an amateur.

dullcrisp

8 months ago

[-]

Seems this might still have the problem of moves being either extremely good or extremely bad depending on how many good moves are found, rather than playing at a consistent level. Or for example in a degenerate case where there are only two moves and one leads to mate, the computer will be picking randomly.

8 months ago

[-]

I don't know enough about chess but surely amateurs don't play at a consistent level. Like, I play a little and I'm quite certain some of my moves are awful (20%?) and some are the same move magnus makes (10%?) and most are on a spectrum between (70%?).

teo_zero

8 months ago

[-]

Better: take one of the existing engines, sort the moves from the best to the worst, if the top move has score S, randomly choose among the moves with score >= 0.9*S.

You can simulate a better/worse player by increasing/decreasing the factor: 1 plays as well as the chosen engine can do, 0 is typing random (yet valid) moves on the keyboard.

8 months ago

[-]

Yup this is better.

bobmcnamara

8 months ago

[-]

Your engine would only mate once it had y options to mate.

8 months ago

[-]

Any time it had an option to mate it would have a chance to mate. It's choose randomly amongst top n, not take the nth.

dullcrisp

8 months ago

[-]

Or y chances. They did say it’d pick randomly. Still not great though, if a bit less funny.

[1]: https://hlfshell.ai/posts/deepmind-grandmaster-chess-without...

hlfshell

8 months ago

[-]

I did a talk about this! (And also wrote up about my talk here[1]). This paper is a great example of both knowledge distillation. It's less of a paper about chess and more about how complicated non linear search functions - complete with whatever tuning experts can prepare - can be distilled into a (quasi-linear, if it's a standardized input like chess) transformer model.

8 months ago

[-]

I think the vs. humans result should be taken with a huge grain of salt. These are blitz games, and their engine’s elo was far higher against humans than against other bots. So it’s likely that time was a factor, where humans are likely to flag (run out of time) or blunder in low time situations.

It’s still very cool that they could learn a very good eval function that doesn’t require search. I would’ve liked the authors to throw out the games where the Stockfish fallback kicked in though. Even for a human, mate in 2 vs mate in 10 is the difference between a win and a draw/loss on time.

I also would’ve liked to see a head to head with limited search depth Stockfish. That would tell us approximately how much of the search tree their eval function distilled.

hlfshell

8 months ago

[-]

The reason the time (blitz) games make sense is because the distilled functionality is of a 50ms Stockfish eval function. The engine likely would perform worse as only the human would benefit from the additional time.

As for limited search tree I like the idea! I think it's tough to measure, since the time it takes to perform search across various depths vary wildly based on the complexity of the position. I feel like you would have to compile a dataset of specific positions identified to require significant depth of search to find a "good" move.

8 months ago

[-]

My point is that if the computer never flags it will have an inherent advantage in low time controls. If not, why not just test it in hyperbullet games? Games where humans flag in a drawn or winning position need to be excluded, otherwise it’s unclear what this is even measuring.

And limited depth games would not have been difficult to run. You can run a limited search Stockfish on a laptop using the UCI protocol: https://github.com/official-stockfish/Stockfish/wiki/UCI-%26...

codeflo

8 months ago

[-]

But the headline is “GM-level performance without search”, not “computer beats human at mouse movement speed contest”.

Scene_Cast2

8 months ago

[-]

If anyone is looking to get into chess neural nets, I highly recommend this repo - https://github.com/sgrvinod/chess-transformers

It uses paradigmatic PyTorch with easy to read code, and the architecture is similar to the current best performing chess neural nets.

https://lczero.org/blog/2024/02/how-well-do-lc0-networks-com...

osti

8 months ago

[-]

The best neural network chess engine's authors wrote about this deepminds publication.

stabbles

8 months ago

[-]

LC0 hasn't been the best neural network chess engine since Stockfish added NNUE in 2020.

https://lichess.org/@/StockfishNews/blog/stockfish-17-is-her...

vlovich123

8 months ago

[-]

That isn’t quite accurate. Stockfish’s NNUE was trained against Leela evaluated positions.

> Generally considered to be the strongest GPU engine, it continues to provide open data which is essential for training our NNUE networks. They released version 0.31.1 of their engine a few weeks ago, check it out!

The main difference is that Stockfish is targeting to run on the CPU while Leela targets the GPU. That stockfish is able to be competitive with Leela is of course impressive.

osti

8 months ago

[-]

True, but since stockfish uses a way way smaller network, I still prefer to think of stockfish as the traditional engine.

chvid

8 months ago

[-]

But the gigantic synthetic dataset that is used for training is created with plenty of traditional search. So it is all a bit silly but I guess cool none the less ...

amunozo

8 months ago

[-]

It's a knowledge distillation. You can then use this smaller, more efficient models instead of the larger one.

chvid

8 months ago

[-]

Or maybe it is just memorizing a very large number of games.

azakai

8 months ago

[-]

They address the possibility of memorization in the PDF:

> This effect cannot be explained by memorization since < 1.41% of the initial puzzle board states appear in our training set.

tech_ken

8 months ago

[-]

Seems more like a 'compression' of the large number of games, or even like an approximate 'index' of the database

alkonaut

8 months ago

[-]

Is this network smaller than stockfish and by what metric is that?

chvid

8 months ago

[-]

If anything it demonstrates the limits of NN. A human brain can learn based on far fewer examples.

jxy

8 months ago

[-]

Nature's evolution algorithm took millions of years to find the architecture and the base model, which then takes decades to be fine tuned to be able to form this opinion.

msoad

8 months ago

[-]

Searched only once. If this can be applied to other knowledge with this efficiency we're onto something

sourcepluck

8 months ago

[-]

I believe GM and chess author (and all-round lovely fellow) Matthew Sadler rigged up Leela Zero to effectively play off intuition and do very little or no search for training games. He could usually beat it, but not always. Think it might have been in The Silicon Road to Chess Improvement.

Scarblac

8 months ago

[-]

He also has very entertaining Youtube videos about the kind of wild opening discoveries Leela comes up with when contempt is set really high (ie it wants to maximally avoid draws), combined with his 2700+ commentary on them.

8 months ago

[-]

I mean for lczero you can just set the max depth at 1 ply for example

sourcepluck

8 months ago

[-]

Oh, maybe it was that simple!

sksxihve

8 months ago

[-]

Isn't generating the training data by running stockfish on all the board positions for all the games just encoding the search tree into the transformer model?

So increasing the number of parameters to the model would allow it to encode more of the search tree and give better performance, which doesn't seem all that interesting.

mewpmewp2

8 months ago

[-]

How could it be possible to encode a search tree like this though.

timmg

8 months ago

[-]

Imagine you collected a billion unique, feasible board positions (all positions is intractable, but most possible positions are impractical) and the best nest move for each. That "best next move" is the result of a tree search.

Now use a transformer to "compress" that information into its model. It sounds like that is approximately what is going on here. Certainly, the model is likely to generalize some aspects of the data (just like LLMs do). But for the most part, the model encodes the information from the Stockfish evaluation.

(This is just my guess of what we are seeing.)

sksxihve

8 months ago

[-]

Exactly, the title says "without search" but in the paper it says "without explicit search", having a system learn to play chess at a grandmaster-level without any search for play or training would be far more impressive, what this does seems pretty obvious that it would work.

QuantumGood

8 months ago

[-]

This repository provides an implementation of our paper Grandmaster-Level Chess Without Search. https://arxiv.org/abs/2402.04494

The recent breakthrough successes in machine learning are mainly attributed to scale: namely large-scale attention-based architectures and datasets of unprecedented scale. This paper investigates the impact of training at scale for chess. Unlike traditional chess engines that rely on complex heuristics, explicit search, or a combination of both, we train a 270M parameter transformer model with supervised learning on a dataset of 10 million chess games. We annotate each board in the dataset with action-values provided by the powerful Stockfish 16 engine, leading to roughly 15 billion data points. Our largest model reaches a Lichess blitz Elo of 2895 against humans, and successfully solves a series of challenging chess puzzles, without any domain-specific tweaks or explicit search algorithms. We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct. A systematic investigation of model and dataset size shows that strong chess performance only arises at sufficient scale. To validate our results, we perform an extensive series of ablations of design choices and hyperparameters.

QuantumGood

8 months ago

[-]

However, the Lichess blitz ELO vs bots is about 700 points lower than vs humans.

jackmalpo

8 months ago

[-]

what i would love is an engine that thinks more like a human. presumably since this uses stockfish annotated games, it basically ends up thinking like a computer. thinking like a human would be awesome for game reviews to walk through things to note in different positions (tuned to my elo).

levocardia

8 months ago

[-]

Or a model whose performance is measured via its efficiency of learning--in other words, how many games does it need to play to learn to play at X level? The reason Magnus Carlsen is impressive is because he's reached his ability level in chess under enormous time and computation constraints, compared to a computer. His efficiency of learning is extraordinary compared to that of any chess engine.

7373737373

8 months ago

[-]

Or even the opposite end of the spectrum - with extremely limited resources (memory/program size/computation time): https://rlc-chess.com/ - like demoscene programs. Capable 1kb chess programs exist!

amoss

8 months ago

[-]

If you solve chess then you have a tree that is too large for us to currently compute (about 10^80 although my memory may be way off). Annotating that tree with win / loss / draw would allow an optimal player without search. The two obvious approaches to compression / optimization are to approximate the tree, and to approximate the annotations. How well those two approaches would work depends a lot on the structure of the tree.

This result seems to tell us less about the power of the training approach (in absolute terms) and more about how amenable the chess game tree is to those two approaches (in relative terms). What I would take away is that a reasonable approximation of that tree can be made in 270M words of data.

timerol

8 months ago

[-]

Note that the exact version of this technique is used in chess for the endgame, referred to as a tablebase. Chess is solved once there are 7 pieces on the board, in an 18.4TB database, described here: https://lichess.org/@/lichess/blog/7-piece-syzygy-tablebases...

7373737373

8 months ago

[-]

Makes me wonder what % of games end with <=7 pieces

adgjlsfhk1

8 months ago

[-]

at a high level, almost all of them that aren't draws. if you have pieces in the right place, you should've be checkmateable. Without blunders, wins occur after a small advantage is followed by slightly favorable trades into a winning position

BoardsOfCanada

8 months ago

[-]

It's somewhat telling that they chose Stockfish as the oracle and not AlphaZero.

8 months ago

[-]

It's well established that SF today is stronger. Btw this is pretty bullish case for open source recreations of RL advancements to be catching up with ~2 year lag. Hopefully alphaproof is also reproduced in that timeframe

epolanski

8 months ago

[-]

Stockfish is stronger than AlphaZero or any other chess engine from quite some time.

Another interesting tidbit, both engines share (or at least, shared) the same lead developer.

vlovich123

8 months ago

[-]

To be fair though, deep mind hasn’t been working to improve alpha zero for chess playing. Their focus is a bit difference than the dedicated volunteers working on stockfish (+ it’s been a few years)

imranhou

8 months ago

[-]

From the page: "We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct."

Why compare this to GPT-3.5-turbo-instruct? Is that near SOTA in this space?

og_kalu

8 months ago

[-]

As far as anyone knows, 3.5-turbo-instruct is the best chess playing (certainly it was at the time of the paper) LLM. About 1800 Elo and < 0.1% Illegal move rate. It's unclear why it was so much better than 4 (lack of RLHF?, Data?) and I don't know if anyone has bothered to test 4o similarly but it was pretty big news online at the time.

8 months ago

[-]

OA definitely purposefully trained its chess strength

og_kalu

8 months ago

[-]

I'm sure they did but there's no reason to believe they pretrained it on chess anymore than 4 so there's some speculation the post training processes mess things up. Turbo instruct does not go through RLHF for instance.

RayVR

8 months ago

[-]

I forget the rough adjustment factors, but it is worth noting that lichess Elo is not the same as chess.com or FIDE. I think lichess is typically ~300 points above chess.com.

This implies the model is around 2500 blitz vs humans. As blitz elo are often much higher than in classical time controls, 2500 elo on chess.com places it firmly in the 'good but not great' level.

I am very curious to know whether the model suffers from the same eval problems vs the well known "anti-bot" openings that stockfish is susceptible to at limited search depths.

[1] https://www.chess.com/member/magnuscarlsen

gpm

8 months ago

[-]

> I think lichess is typically ~300 points above chess.com.

Yeah, no. They are two different rating systems (not ELO incidentally) with different curves, there isn't a fixed difference you can apply. At the high end of the scale lichess ratings are below, not above, chess.com ratings. E.g. Magnus Carlsen is 3131 blitz on lichess [0], 3294 blitz on chess.com [1].

This website [2] tries to translate between the sites, and figures that a 2925 lichess blitz rating (the closet on the website to the one reported in the paper of 2895) translates to 3000 chess.com.

[0] Multiple accounts but this is the one I found with the most blitz games: https://lichess.org/@/DrNykterstein/perf/blitz

[2] https://chessgoals.com/rating-comparison/#lichesschesscom

https://en.wikipedia.org/wiki/Elo_rating_system

jdck1326

8 months ago

[-]

Elo. Not acronym, named for its inventor Arpad Emmerich Elo

RayVR

8 months ago

[-]

Cool, thanks for the information! I did not realize the curves were that different.

3000 definitely bumps it up but it would still be worse than the top humans. stockfish is better by a lot.

YeGoblynQueenne

8 months ago

[-]

Excellent work but I suggest a slightly different title:

"What would Stockfish Do?"

A more appropriate title; because Stockfish is a search-base system and DeepMind's approach wouldn't work without it.

Oh, btw, this is (yet another) a Neurosymbolic system of the "compiling system 2 to system 1" type.

dgraph_advocate

8 months ago

[-]

This is missing a makefile to automate the manual installation steps

https://news.ycombinator.com/item?id=39301944

ChrisArchitect

8 months ago

[-]

Associated discussion on the paper:

Grandmaster-Level Chess Without Search

squidgedcricket

8 months ago

[-]

Would it be feasible to create a complete lookup table of 'best' moves for all given board configurations? I'm not sure how to determine the total number of configurations. Not the same as a tablebase, just a single next move rather than sequence to checkmate.

It wouldn't be competitive against top tier players and AI, but I wouldn't be surprised if it could beat me. 'Instantly' knowing the next move would be a cool trick.

k2xl

8 months ago

[-]

The amount of data that would be required for a lookup table for all best moves for every board configuration would be infeasible.

They have managed to create one for 7 pieces. Last update on trying to get to 8 piece database: https://www.chess.com/blog/Rocky64/eight-piece-tablebases-a-...

squidgedcricket

8 months ago

[-]

Yup, and it looks like a complete tablebase from the start of the game won't ever be feasible.

> From May to August 2018 Bojun Guo generated 7-piece tables. The 7-piece tablebase contains 423,836,835,667,331 unique legal positions in about 18 Terabytes.

andrelaszlo

8 months ago

[-]

Almost halfway there ;)

8 months ago

[-]

There are 32 pieces on the board at the start of the game.

8 months ago

[-]

There are more possible chess games than there are atoms in the universe. It can't be solved by brute force.

https://tromp.github.io/chess/chess.html https://physics.stackexchange.com/questions/47941/dumbed-dow...

squidgedcricket

8 months ago

[-]

There's a lot of chess configs, but there's a LOT of atoms in the observable universe. I suspect there's a few in the unobservable universe too.

Chess configs = 4.8 x 10^44, Atoms > 10^70

You might be able to pull off a low-resolution lookup table. Take some big but manageable number N (e.g 10^10) and calculate the maximally even distribution of those points over the total space of chessboard configurations. Then make a lookup table for those configs. In play, for configs not in the table, interpolate between nearest points in the table.

https://en.wikipedia.org/wiki/Shannon_number

8 months ago

[-]

I didn't say chess positions, I said chess games. That number has a lower-bound of 10^120.

Scarblac

8 months ago

[-]

But that's not the relevant thing if we're talking about storing a best move per possible position.

8 months ago

[-]

Unless you’ve calculated every line to a forced win or draw you don’t actually know the objective evaluation of a position and so you can’t determine “best move”. That’s what a tablebase is.

Scarblac

8 months ago

[-]

Yes, I figured that he would need a tablebase anyway. But that's still a few bits per position, the number of possible games doesn't come in to it.

8 months ago

[-]

A tablebase does include every possible move from every possible position. There are many transpositions, saving space but still every single possible line from a position is represented in a tablebase. How is that different from every possible game from that position?

A 6-piece tablebase is 150GB. A 7 piece is 18TB. An 8 piece is thought to be 2PB, but we don't have one yet. How big do you think a 32-piece tablebase will be?

roenxi

8 months ago

[-]

That is basically what a neural network based chess engine is. The function the neural network is encoding is logically equivalent to "probability this move is the best for this board state".

The resolution isn't great, and adding search to that can be used to develop an implicit measure of how accurate the function is (ie, probability the move suggested in a position remains unchanged after searching the move tree for better alternatives).

https://arxiv.org/abs/2402.04494

bob1029

8 months ago

[-]

> Board states s are encoded as FEN strings which we convert to fixed-length strings of 77 characters where the ASCII-code of each character is one token. A FEN string is a description of all pieces on the board, whose turn it is, the castling availability for both players, a potential en passant target, a half-move clock and a full-move counter. We essentially take any variable-length field in the FEN string, and convert it into a fixed-length sub-string by padding with ‘.’ if needed. We never flip the board; the FEN string always starts at rank 1, even when it is the black’s turn. We store the actions in UCI notation (e.g., ‘e2e4’ for the well-known white opening move). To tokenize them we determine all possible legal actions across games, which is 1968, sort them alphanumerically (case-sensitive), and take the action’s index as the token, meaning actions are always described by a single token (all details in Section A.1).

I am starting to notice a pattern in these papers - Writing hyper-specific tokenizers for the target problem.

How would this model perform if we made a small change to the rules of chess and continued using the same tokenizer? If we find we need to rewrite the tokenizer for every problem variant, then I argue this is just ordinary programming in a very expensive disguise.

kidintech

8 months ago

[-]

// personal opinion: I think machine learning as it currently stands is widely overhyped

How is this the top comment?

> I am starting to notice a pattern in these papers - Writing hyper-specific tokenizers for the target problem.

This is merely expressing what they consider as part of a game state, which is entirely needed for what they set out to do.

> I argue this is just ordinary programming

"Ordinary programming" (what does that mean?) for such a task implies extraordinary chess intuition, capable of conjuring rules and heuristics for the task of comparing two game states and saying which one is "better" (what does better mean?).

> How would this model perform if we made a small change to the rules of chess and continued using the same tokenizer?

If by "small change" you are implying i.e. removing the ability to castle, then sure, the tokenizer would need to be rewritten. At the same time, the entire training dataset would need to be changed, such that the games are valid under your new ruleset. How is this controversial or unexpected?

It feels like you are expecting that state of the art technology allows us to input an arbitrary ruleset and the mighty computer immediately plays an arbitrary game optimally. Unfortunately, this is not the case, but that does not take anything away from this paper.

mewpmewp2

8 months ago

[-]

I don't know much about this space, but it seems like this could be solved by leaving a good amount of empty tokens that you would only start using when they arise. Or leave tokens which you can use together to combine anything for various edge cases. Because if you have all the characters as tokens you can combine them into anything.

BurningFrog

8 months ago

[-]

If you change to the rules, you have a different game than chess.

Since there is no training data for that game, I don't know you get this kind of AI to do anything?

barelyusable

8 months ago

[-]

What are some of the go-to books/articles for computer chess? I like the game and have a decent understanding of basics, so studying algorithms based on the game would be a good opportunity for me to learn conventional algos, but also RL/ML/MCTS etc. Also I wonder what is the go-to codebase these days?

https://www.chessprogramming.org/

sinuhe69

8 months ago

[-]

Is the portal to go. From there, you can dig deeper in many relevant themes.

8 months ago

[-]

Stockfish and Leela are the two best engines and they’re both on GitHub. Although I will say they are hyperoptimized to win.

You can also ask any noob questions on their Discords.

rawsh

8 months ago

[-]

You can actually get solid performance with pretrained chat models: https://raw.sh/posts/chess_puzzles

On lichess puzzles gpt4o with the compiled prompt is around 70%, I think the 270M transformer is around 95%

sinuhe69

8 months ago

[-]

It depends on the difficulties of the puzzles. If you read the article of the Leela author, (link cited by other comment), then you will see a much different picture: the new DeepMind is better then AlphaZero but much worse than the best open source model. Not specially trained transformer like ChatGPT4o (and even 5o) has absolutely no chance to solve the more difficult puzzles.

joelthelion

8 months ago

[-]

I wonder if you could creatively combine this model with search algorithms to advance the state of the art in computer chess? I wouldn't be surprised to see such a bot pop up on tcec in a couple years.

8 months ago

[-]

The advantage of this flavor of engine is that it might make parallel position evaluation extremely efficient. Calculate 1024 leaf positions and batch them to the model, take the top 10% and explore their sub-trees either via further GPU batching or minimax eval.

NNUE already tries to distill a subtree eval into a neural net, but it’s optimized for CPU rather than GPU.

hinkley

8 months ago

[-]

As a game player I want to play an opponent that behaves like a human. Otherwise I’m always looking for the flaw in the design that I can exploit, which wins me the game but is less fun.

What you’re discussing sounds like intuition with checking, which is pretty close to how humans with a moderate degree of skill behave. I haven’t known enough Chess or Go masters to have any claim on how they think. But most of us don’t want an opponent at that level and if we did, we would certainly find a human, or just play against ourselves.

8 months ago

[-]

The issue is that humans and computers don't evaluate board positions in the same way. A computer will analyze every possible move, and then every possible response to each of those moves, etc. Human grandmasters will typically only analyze a handful of candidate moves, and a few possible replies to those moves. This means human search is much narrower and shallower.

If you want a computer that plays like a human, you will probably need to imitate the way that a human thinks about the game. This means for example thinking about the interactions between pieces and the flow of the game rather than stateless evaluations.

hinkley

8 months ago

[-]

Grandparent was suggesting the hybrid approach where you select a handful of good candidate positions and then explore them (DFS) as far as possible. Which is pretty much how humans work.

sinuhe69

8 months ago

[-]

Leela, the open source model already (and always) does that and is already much better then the new DeepMind model. No, neural networks are basically curve fitting. You can only do so much approximation without overfitting and there are always positions, which are different enough from the “mainstay” positions that NN can not learn them. DeepMind as always want to impress the public by creating artificial conditions to show its product under better light. But the reality is:

- The Leela open source community had already used transformer architecture to train Lc0 long before the paper (and published it, too!) and got much better result than new DeepMind massive model

- The top engines with with search (Stockfish NNUE, Lc0) beat DeepMind’s model by margins under normal competition’s conditions

- Speaking about efficiency, Stockfish NNUE can run on a commodity PC with only slightly lower ELO. AlphaZero or DeepMind’s new model can not even run to begin with.

alfalfasprout

8 months ago

[-]

The thing is classical chess (unlike eg; go) is essentially "solved" when run on computers capable of extreme depth. Modern chess engines play essentially flawlessly.

KK7NIL

8 months ago

[-]

The developers of stockfish and lc0 (and the many weaker engines around) would disagree, we've seen their strength improve considerably over the last few years.

Currently there's a very interesting war between small neural networks on the CPU with high search depth alpha-beta pruning (stockfish NNUE) and big neural networks on a GPU with Monte Carlo search and lower depth (lc0).

So, while machines beating humans is "solved", chess is very far from solved (just ask the guys who have actually solved chess endgames with 8 or less pieces).

GaggiX

8 months ago

[-]

Stockfish and lc0 would always draw if they are not put in unbalanced starting positions, the starting position will be swapped in the next game to make it fair.

KK7NIL

8 months ago

[-]

In classical controls (what TCEC mainly uses), yes. They can play pretty exciting bullet chess without a forced opening though.

KolmogorovComp

8 months ago

[-]

> ask the guys who have actually solved chess endgames with 8 or less pieces

Source?

8 months ago

[-]

Chess is not “solved”. Solved doesn’t mean computers can beat humans, it means for any chess board position we can tell whether white wins, black wins, or the game is drawn with perfect play. We would know if the starting position was drawn, for example.

No computers now or in the foreseeable future will be capable of solving chess. It has an average branching factor over 30 and games can be over 100 moves.

8 months ago

[-]

There's strong solved and weak latter only needs to be unbeatable from starting. Now it's definitely not probable that SF isn't beatable from initial but honestly it's not impossible. The drawing margin is pretty big for engines

solveit

8 months ago

[-]

We really have no way to know this. But I would be very surprised if modern chess engines didn't regularly blunder into losing (from the perspective of a hypothetical 32-piece tablebase) positions, and very very surprised if modern chess engines perfectly converted tablebase-winning positions.

KK7NIL

8 months ago

[-]

We do know this, there are many positions (primarily sharp middle game one's) where SF/lc0 will significantly change their evaluation as they go deeper. This problem gets better the more time they spend on one position but it's an inevitable consequence of the horizon effect and it's why (except for 8 pieces or less), chess is far from solved.

8 months ago

[-]

Far from strongly solved but i would wager current SF will not lose half of its white games against any future engine

https://arstechnica.com/information-technology/2023/02/man-b...

Kstarek

8 months ago

[-]

hah, disagree strongly tbh, there may be an adversarial engine that specifically finds positions that stockfish doesn't evaluate well. they did the same to AlphaGo:

8 months ago

[-]

True that's a reasonable possibility. But in go the top engines are far from perfection but it's not clear in chess

8 months ago

[-]

The fact that TCEC games aren’t all draws suggests that computers aren’t perfect. Stockfish loses to Leela sometimes for example.

grumpopotamus

8 months ago

[-]

Tcec games are deliberately played from imbalanced opening positions. The draw rate would be much higher for the top participants if this wasn't forced. However, I agree that engines are not perfect. I have heard this claim many times before a new engine came along that showed just how beatable the state of the art engines still were at the time.

8 months ago

[-]

Most TCEC starting positions are borderline lost

8 months ago

[-]

Ofc they do but the more interesting question for weak solved is whether they do in mainline positions (like mainline Berlin, mainline petroff, etc) where you can hold equality in many ways and engines are printing 0.0 everywhere

__s

8 months ago

[-]

not only blunder into losing positions, but also blunder from winning positions into draws

even in human chess people sometimes mistaken draw frequency to reflect both sides playing optimally, but there are many games where a winning advantage slips away into a draw

8 months ago

[-]

This is accurate for endgames only. In complicated positions, there is still room for improvement - the recent game of lc0 vs stockfish where lc0 forced a draw against an impending checkmate is a good example. There is currently no way for a chess engine searching a massive game tree can see how an innocuous pawn move enables a forced stalemate 40 moves down the line.

8 months ago

[-]

Honestly SF plays better in middle game positions on average I would guess. I think usually there's a bigger draw margin in middle games

8 months ago

[-]

I think you are correct as I recall there is a set of middle game chess puzzles where Stockfish outperformed the other chess engines by a wide margin - I can't find the link as it was years ago. Not sure how the state-of-the-art has progressed since then. But I do believe the "horizon effect" plays a role in whether an engine decides to forcibly draw a game, which afforded Stockfish + NNUE a distinct advantage (at least at the time).

__s

8 months ago

[-]

compared to humans yes, but between themselves in TCEC progress continues. TCEC has AIs play both sides of random openings, rather than stick to playing chess's initial position. The same happens for checkers amongst humans, where opening positions are randomized

dougSF70

8 months ago

[-]

Slightly off topic but I built https://chessladders.com

1bit_e

8 months ago

[-]

Nice! I built https://www.chesspuzzlebot.com/ (Puzzles against stockfish)

veltas

8 months ago

[-]

Playing chess with a 20TB cerebellum

dudeinjapan

8 months ago

[-]

OK but when will someone make grandmaster-level search without chess?

vanjajaja1

8 months ago

[-]

what i like about this is that it implies you can build heuristics good enough to make it to GM level. this is great because i find calculating moves a headache

jvanderbot

8 months ago

[-]

The thing is the heuristic done by a huge network might be insanely complex and doing all kinds of calculations. it's just that it's one function call so we ignore all those calculations. It's not immediately obvious that deploying a transformer to solve for next best move means that a human mind can avoid difficult calculations and just play by gut.

There's just too much wordplay going on with "heuristic"

mewpmewp2

8 months ago

[-]

Arguably intuition and gut feeling are also insanely complex systems doing all kinds of calculations.

jvanderbot

8 months ago

[-]

Sure, but apples and oranges have a lot in common too, and if we call everything fruit, it's unnecessarily hard to differentiate them ... is all I'm saying.

xiaodai

8 months ago

[-]

this paper is so dumb. so you modeled the output of stockfish? stockfish does use simulation or selfplay or search. so you've outsourced search and dont do it yourself so you can claim to be "without search"

scotty79

8 months ago

[-]

I think what they say is valid. Same way ChatGPT can answer questions google could answer without doing any search like google does.

hilux

8 months ago

[-]

Do you understand how ML models are trained?

bicsi

8 months ago

[-]

They built a dumber clone of Stockfish, and they call it ‘zero’ for some reason. What is the meaning behind ‘zero’ anyways? It used to refer to zero-shot, but now it seems like it’s just a marketing term.

8 months ago

[-]

I assume you’re referring to AlphaZero and Leela Chess Zero.

AlphaZero was the successor to AlphaGo. AZ was notable because unlike AG, it used zero human games to learn to play: it just played games against itself. Typically in “supervised” machine learning you take human data and train a model to imitate it. AZ used zero human data to learn.

Leela Chess Zero started out as an open source copy of AZ but it’s probably better than AZ now.

pertymcpert

8 months ago

[-]

Zero doesn't mean zero shot learning. It was coined by Deepmind for AlphaGo Zero where they used zero human input into the training data. It was trained entirely by playing against itself.

hilux

8 months ago

[-]

And they gave Sir Demis a Nobel Prize!