Is there something special about these chess engines that makes SPSA more desirable for these use cases specifically? My intuition is that something like Bayesian optimization could yield stronger optimization results, and that the computational overhead of doing BO would be minimal compared to the time it takes to train and evaluate the models.
Or, if attempting to use SPSA to say, perform a final post-training tune to the last layers of a neural network, this could be thousands of parameters or more.
That being said, it still seems possible to be that using a different black box optimization technique for a fairly constrained set of related magic numbers (say, fewer than 50) might lead to some real performance improvements in these systems, could be worth reaching out to the lc0 or stockfish development communities.
Statisticians and operations researchers have spent a hundred years deciding how to do as few experiments as possible to tweak parameters in the ways that give the highest impact with statistical basis that the selections are good.
In the language of information and decision trees, these experiments are trying to in some sense “branch” on the entropy minimizing variables.
https://github.com/official-stockfish/fishtest/wiki/Fishtest...
There's simply a lot of sample efficiency to gain by adapting the experiment to incoming data in a regime where one can repeatedly design n candidates, observe their effects, and repeat m times compared to a setting where one must design a fixed experiment with n*m samples.
The video is probably the least bizarre thing there, if that's what you are warning about.
Feds this guy right here ^^
Although, setting any kind of hair on fire in public should be punishable, primarily because of stench of the burnt hairs.
One of my formative early internet experiences was loading up a video of a man being beheaded with a knife.
Luckily, I realized what was about to happen, and didn't subject myself to the whole thing.
Thanks for the warnings, kind strangers.
You now have a generation of people who think it is cool to be mentally ill.
Serial killers get fan mail, that’s true now and it was true 100 years ago.
Psychoanalysis while mostly quackery is ~135 years old providing an example where talking was considered a viable therapy not just locking people up or tossing out lobotomies left and right to anyone slightly abnormal.
So sure, 100 years ago there was quackery just as today, but “possessed by demons” wasn’t considered mainstream back then any more than it is today.
blog post is good
Response from the author of Viridithas, there is a link to this engine in her webpage.
> I use she/her pronouns
The idea of something being "defiantly" NSFW gave me a chuckle.
Chess engines have been impossible for humans to beat for well over a decade.
But a position in chess being solved is a specific thing, which is still very far from having happened for the starting position. Chess has been solved up to 7 pieces. Solving basically amounts to some absolutely massive tables that have every variation accounted for, so that you know whether a given position will end in a draw, black win or white win. (https://syzygy-tables.info)
I haven't verified OP's claim attributed to 'someone on the Stockfish discord', but if true, that's fascinating. There would be nothing left for the engine developers to do but improve efficiency and perhaps increase the win-to-draw ratio.
And the play style of Alpha Zero wasn't different in a way that needs a super trained chess intuition to see, it's outrageously different if you take a look at the games.
I guess my point is, that even if the current situation is basically a 'deadlock', it's been proven that it's not some sort of eternal knowledge of the game as of yet. There's still the possiblity that a new type of approach could blow the current top engines out of the water, with a completely different take on the game.
IMO AlphaZero was partially a result of the fact that using more compute also works. Stockfish 10 running on 4x as many CPUs would beat Stockfish 8 by a larger margin than AlphaZero did. To this day, nobody has determined what a "fair" GPU to CPU comparison is.
War was "solved" when someone made a weapon capable of killing all the enemy soldiers, until someone made a weapon capable of disabling the first weapon.
But I'm not sure whether that guy was guessing or confident about that claim.
In that hypothetical of running 2 instances of Stockfish against one another on a modern laptop, with the key difference being minutes of compute time, it'd probably be very close to 100% of draws. Depending on how many games you run. So, if you run a million games, there's probably some outliers. If you run a hundred, maybe not.
When it comes to actually solved positions, the 7-piece tables take around 1TB of RAM to even run. These tablebases are used by Stockfish when you actually want to run it at peak strength. [3]
[0]: https://tcec-chess.com [1]: https://lichess.org/broadcast/tcec-s28-leagues--superfinal/m... [2]: https://lczero.org [3]: https://github.com/syzygy1/tb
I remember hearing that starting position is so draw-ish that it's not practical anymore
Chess is a 2 player game of perfect, finite information, so by Zermelo's theorem either one side always wins with optimal play or it's a draw with optimal play. The argument from the Discord person simply says that Stockfish computationally can't come up with a way to beat itself. Whether this is true (and it really sounds like a question about depth in search) is separate from whether the game itself is solved, and it very much is not.
Solving chess would be a table that simply lists out the optimal strategy at every node in the game tree. Since this is computationally infeasible, we will certainly never solve chess absent some as yet unknown advance in computation.
In the TCEC game, I see "2. f4?!", so I'm guessing Stockfish was forced to played some specific opening, i.e. it was forced to make a mistake.
For what it's worth, Stockfish wins the rematch also. https://tcec-chess.com/#game=13&round=fl&season=cup16
It's also almost certainly the case, in that I don't know why you would do it, that Stockfish given the black pieces and extensive pondering would be meaningfully better than Stockfish with a time capped move order. Most games are going to be draws so practically it would take awhile to determine this.
I'm of the view that the actual answer for chess is "It's a draw with optimal play."
How could we possibly know this?
> it is unbeatable by any chess engine
So its engine is finished? There's no further development? No new algorithms?
Isn't it obvious that increasing time per move will make the engine better and at some point perfect?
> So its engine is finished? There's no further development? No new algorithms?
No.
See the main page https://girl.surgery/
And there's:
> Here's a video of me burning off my pubic hair in the alley.
A quick visit at the homepage suggests that it's probably the latter. I don't want to be rude, not posting out of malice, but if someone else was reading this and was trying to parse it, I think it might be helpful to compare notes and evaluate whether it's better to discard the article altogether.
ML isn't my strong suit so I wouldn't be able to explain how, but Cosmo's article is almost entirely a refutation of the points made by the root article. No doubt he is very friendly, as someone would be to anyone interested in their field.
What I can speak about is the general construction of sentences, they read (in the most charitable of interpretations) like text messages:
"Good model vs bad model is ~200 elo, but search is ~1200 elo, so even a bad model + search is essentially an oracle to a good model without, and you can distill from bad model + search → good model."
I take it that by "is ~X elo" they mean that implementing that strategy results in a gain of 200 ELO? Which would still be undefined, as 1000 to 1200 is not the same as 2800 to 3000, and improvements are of course not cumulative. I get that this reads more like internal notes, but it was published, so there was some expectation that it would be understood by someone else.
For a lot more reasons, the writing reminds me of notes written by me or by loved ones under influence of drugs. My estimation is that the article was written by a mind that used to be brilliant but is now just echoing that brilliance while, trying to keep their higher order cognitive functions while struggling to maintain the baseline of basic language use. I hope it is reversible and if per is reading this and my estimation is correct, that they perturb the weights in favour of quitting drugs and see if they win more or not.
The point I was trying to make with "RL is only necessary once" is that you can embark on a single self-play loop getting better and better, and this will get you to something close to the frontier. Once you're at the frontier, the frontier doesn't move very much, so you have quite a while (decade?) where it's totally fine to distill from the RL games.
On correction histories -- imo I correctly described what they do. Cosmo was annoyed by the word "adapt" but what I described was the adaptation.
On SPSA -- you don't have a gradient! you don't do backprop! this is what i was trying to get at.
Elo is defined such that the expected win-rate of a player should only depend on the difference in Elo rating to their opponent. https://en.wikipedia.org/wiki/Elo_rating_system#Mathematical...
an increase of X ELO points doesn't have a significant meaning, as an increase of points from 1000 to 1200, would be very different from an increase of points from 1500 to 1700, and very different to an increase of points from 2800 to 3000.
There's a million ways a player or an engine can go from 1000 to 1200. But whatever tactic or change would make a player go from 1000 to 1200, could easily cause a 2800 player/engine go to 2750 or 2400 or... 1200.
lichess.com/@/TotomiBot
It's currently uses a 3ply exhaustive search, with the exception that takes don't count toward the ply limit, so it actually evaluates all branches up until the third non taking move.
On the evaluation, it uses two separate scoring values, one for material, using Lasker style piece values, and another for tiebreaking, which would be the positional score.
Positional score is mostly determined by a bitboard with for each piece type, with positive and negative biases towards specific squares, for exmaple the king bishop pawn is heavily incentivized to stay put. The boards are perspective based, so it works the same if you are black or white without needing symmetry (which would make promotion strategies hard).
There's also a couple of heuristics for king safety.
The complexity has come to a point where it's hard to predict what will improve or make it worse by just fiddling with the heuristics. But there's probably a lot of room for improvement in terms of reducing and optimizing compute time.
AI is python based but calculation (search) and evals are offloaded to a c lib for efficiency.
ELO is around 1400, and you can pretty much only beat it positionally (or with a very aggressive early sacrifice. As there's almost no hanging of pieces.
> ML isn't my strong suit so I wouldn't be able to explain how, but Cosmo's article is almost entirely a refutation of the points made by the root article. No doubt he is very friendly, as someone would be to anyone interested in their field.
ML is familiar to me but far from my specialty. It was very difficult for me to understand the points from Cosmo's article, even if it seems more technically correct and less notes-y. Actually, it was likely because it was aiming for high technical correctness that some sentences are impossible for me to digest. (AlphaZero is a strange inversion of RL, where all of the “learning how to map situations to actions so as to maximize a numerical reward signal” is done online, by a GOFAI algorithm, and absolutely no reinforcement learning makes it into the actual gradient used to train the network!)
I think you may have misunderstood the Now we get to the scathing criticism line as being literal rather than ironic (or literal disguised as irony), because most of Cosmo's points are clarifications and distinctions only understandable or valuable to chess engine/ML experts. Many of Cosmo's points are agreement or unrelated; many others are self-professed nitpicks; and among the rest, I think Cosmo is being overly harsh. For example, the discussion on "no gradient" is an agreement in disguise, because what girl.surgery means to say (and what I understood the first read around) is simply that SPSA is like gradient descent, but without access to analytical derivations of derivatives. As another example, the discussion on "self-play was only necessary one time" leads to Cosmo only disagreeing with the language, not the description of the process; "bad model + search → good model" per girl.surgery is mirrored by Cosmo saying "To surpass that ceiling, you must search-amplify the new network, generating better data than the old oracle could, and distill again — and this is precisely the self-play loop," and if I had to guess girl.surgery means by "self play" bootstrapping from absolutely nothing rather than from another highly capable model.
> I take it that by "is ~X elo" they mean that implementing that strategy results in a gain of 200 ELO? Which would still be undefined, as 1000 to 1200 is not the same as 2800 to 3000, and improvements are of course not cumulative.
I understood +X elo over the next-best model, when the context is that of top-shelf models rather than near amateur human play. This usage of "elo gains" in generalized context is even used by Tilps and Crem in Cosmo's quote. It's just a ballpark of the magnitude of strength difference we're talking about, one which is actually not as contextually sensitive as you might think, because of what yorwba notes about the very definition of elo.
> For a lot more reasons, the writing reminds me of notes written by me or by loved ones under influence of drugs. My estimation is that the article was written by a mind that used to be brilliant but is now just echoing that brilliance while, trying to keep their higher order cognitive functions while struggling to maintain the baseline of basic language use. I hope it is reversible and if per is reading this and my estimation is correct, that they perturb the weights in favour of quitting drugs and see if they win more or not.
Very possibly. But I might offer an alternative, more charitable explanation: profound neurodivergence and/or mental illness. I personally know at least one troubled genius who writes like this, if not worse, but who is more than capable of very serious intellectual projects and research. The nature of autism tends to make it harder to write for a general audience without coming off as bizarre, and in my experience they are better at interactive, 1-on-1 discussions where you can ask questions to course-correct them away from burrowing too deep into their own head.
The mass delusion of, "I don't understand what I'm reading, therefore it must be produced by an llm."
I think it's a pretty serious problem. Not that llm text exists on the internet, but that reasonable people are reflexively closed off to creativity because the mere existence of the possibility that something is created by an llm is in their minds grounds for disqualification.
A common property of llm psychosis is the development of an internal vocabulary that the llm learns, often reusing words but adopting specific meanings, for some reason quantum and quantic are very popular for this.