We have agents implement agents that play games against each other- so Claude isn't playing against GPT, but an agent written by Claude plays poker against an agent written by GPT, and this really tough task leads to very interesting findings on AI for coding.
Are you going to share those with the class or?
Gemini is consistently winning against top models
Ultimately I think it's impossible to define AGI. Maybe "I know it when I see it"—except everyone sees it at a different point (evidently).
And as a poker player, I can say that this game is much more challenging for computers than chess, writing a program that can play poker really well and efficiently is an unsolved problem.
It doesn't even need to be one tool but a series of tools.
Heh, we really did come full circle on this! When chatgpt launched in dec22 one of the first things that people noticed is that it sucked at math. Like basic math 12 + 35 would trip it up. Then people "discovered" tool use, and added a calculator. And everyone was like "well, that's cheating, of course it can use a calculator, but look it can't do the simple addition logic"... And now here we are :)
Maybe we should just get rid of tedious benchmarks like chess altogether at this point that is leading people to think of how to limit AI as a way of keeping it a relevant benchmark rather than expanding on what is already there.
How you work without calculators is a proxy for real world competency.
Trying to solve everything with CoT alone without utilising tools seems futile.
Chess engines don’t grow on trees, they’re built by intelligent systems that can think, namely human brains.
Supposedly we want to build machines that can also think, not just regurgitate things created by human brains. That’s why testing CoT is important.
It’s not actually about chess, it’s about thinking and intelligence.
That was a whole half a decade ago, but back then deep learning AIs were defeated very badly by handcrafted scripts. Even the best bot in the neural net category was actual a symbolic script/neural net hybrid.
Bizarre.
AI already has a very creative imagination for role play so this just adds extra to their arsenal.