The latest sign that Skynet is about to come online and the robots are about to take over comes to us not via the military or healthcare. No, this time, the robots have come for our poker.
For a while now, data analytics (i.e. computers) have been crunching numbers about poker. From computer analytics, we’ve found the optimum way to play poker in one-on-one situations, we have game theory, and we have more tools to analyze our competition.
Then the folks at Carnegie Mellon came along and built an AI that, apparently, can’t be beat. In a scenario harkening back to when Gary Kasparov lost to Deep Blue, there is now an AI out there who can play unbeatable poker. Even worse, a poker AI has also been deployed in the most nefarious poker den in the world – Facebook – and is racking up the wins.
How did his AI come to be? What does this mean for the world of poker? Only time will tell, but I can at least peer into the future and make some educated guesses.
Say Hello to Pluribus
When Skynet comes online, its name will be Pluribus.
Okay, that’s really just hype, but the name of Carnegie Mellon’s robot (built on top of Facebook AI) is in fact Pluribus. It was invented by Angel Jordan, Professor of Computer Science, Tuomas Sandholm and Noam Brown, a Ph.D student at Carnegie Mellon who also works on Facebook AI.
Up until this point, a lot of the computer-based poker AIs were only rated to play in one-versus-one games. Playing head-to-head, while never easy, is a simpler problem to solve for a computer because there are a lot fewer variables to consider and calculate.
This includes Libratus, another Sandholm AI, who was able to defeat multiple real money poker players in two-player games.
Pluribus, on the other hand played thousands of matches against five other opponents and was able to consistently beat the professionals. Even more importantly, the competition Pluribus was up against was nothing to sneeze at. In one case, Pluribus played and beat thirteen players who made over one million dollars (playing in games of six.)
The thing that’s really amazing, though, is how efficient Pluribus was. According to Carnegie Mellon’s website, Libratus needed 1,400 cores (about 350 processors similar to the ones in a personal computer) and over fifteen million core hours to win. And that was for one-on-one play.
Pluribus needed only 28 hours (roughly 7 processors) and needed only 12,400 core hours to win. That’s a dramatic increase in efficiency, especially given how many more variables it needed to compute.
How Pluribus Wins
I could geek out on the computer science behind Pluribus’ wins, but I won’t.
The important thing to keep in mind is that when Pluribus started playing, it was playing at six tables at once. It’s started with six copies of itself with a strategy for the first round.
After, it started to use what it found to train itself to play better. Each subsequent round, it then uses information from previous games to improve its play. It also means that, at the end of the hands, there could be six different versions of the algorithms which the team could then merge to define an even more complete betting strategy.
What is perhaps the most fascinating about the Pluribus play is that fact it uses “limited-lookahead” search to play out entire games.
That’s pretty much what humans do.
Essentially, the key to Pluribus winning so much was that it could play the current hand and make decisions by playing out what was likely to happen in the future hands. Carnegie Mellon’s site was careful to note that Pluribus couldn’t simulate the whole game (too many variables), but that it could simulate what would happen next.
More than likely, Pluribus would be able to simulate several different outcomes very quickly before deciding on the proper next move. For instance, Pluribus could simulate what would happen if it checks, folds, bets a large amount, bets a small amount, etc. and then make a decision based off of simulated games.
That’s pretty cool.
Being Unpredictable Is Also Cool
Did I mention that Pluribus is also designed to be unpredictable?
Sandholm and Brown realized that Pluribus could reasonably fall into the trap of doing the same thing. It’s a computer, after all, and most AI will decide on a strategy as being “optimum” and keep doing that.
Not Pluribus. Pluribus could not only simulate what the best move in situation was, it was also aware of what it was likely to do in any given situation. It would then think about what it was likely to do and then had an algorithm so that it could decide to do something else.
This kept the other players guessing as to Pluribus’ real strategy.
It also presented a level of unpredictability that even a human could never reach. At the end of the day, humans are creatures of habit who do what they know. They have tendencies.
Pluribus is keenly aware of its own tendencies and can act against them sheerly for the purposes of deception.
That’s pretty cool.
Why Pluribus’ Wins Matter
First, in some ways, Pluribus represents the ultimate in poker opponent. (I now sound like the scientist villain in every doomsday science fiction movie.) Still, Pluribus is able to calculate numerous what-if scenarios. It knows its own tendencies and can build smoke screens around that.
Even worse, Pluribus never suffers from tilt. It will dispassionately evaluate bluffs and bets and react accordingly.
Also, Pluribus uses strategies that humans rarely do. First, according to poker professional Darren Elias, one reason Pluribus was successful was because it could actually mix strategies. Humans try to mix strategies, but like I said, we fall into patterns.
The computer doesn’t because it can recognize its own patterns and counteract them.
Even more strangely, Pluribus used strategies humans generally consider weak. According to Carnegie Mellon’s website, one of these was the “donk” bet in which a player ends a round with a call and then starts the next round with a bet.
It’s an odd bet and should rarely be the proper tactic. In a lot of cases, it’s better to value bet or get some money from the other players with a small bet.
However, according to Carnegie Mellon, Pluribus was a lot more likely to donk bet than any of the humans it defeated. If for no other reason, this experiment become a lot more interesting because it may teach us humans new ways to play.
For now, no one really has to worry about Pluribus taking over. Both Sandholm and Brown can take the code and do with as they please, though both have agreed to not use the code for defense purposes.
So, that means no Skynet, at least the Terminator 2: Judgement Day version.
However, this is hardly the last step in poker AI. I, personally, would like to see AI use Google’s already existing technology to recognize body movements and nonverbal communication to begin recognizing bluffs and tells.
I wouldn’t want to play against that bot, but it would be an incredibly interesting experiment to observe.
Also, I think every serious poker professional should study what Pluribus did. It’s time to revisit the effectiveness of donk better. It’s time for the humans to see what the robot did and improve our overall game.
I don’t say that because I am afraid of robots. I just don’t want to see a lot of learning go to waste and I personally believe poker players can take good poker strategy by seeing what the robot did to win.
Then some players need to use that new strategy to replay Pluribus and figure out how it answers. Then those players can continue to evolve what they do and so on.
Hopefully all the Terminator jokes weren’t taken seriously. A dedicated poker AI is not going to take over the world and, as long as no one can turn it into a bot on a poker site, isn’t even going to damage our wallet.
With that said, that doesn’t mean that the AI isn’t incredibly cool on its own. The science that went into its decision making is something that humans can learn from (playing all the what-if scenarios) and the strategies it used are things players should consider to make their own games better when playing poker.