deepmind – AI in Media and Society

I was catching up today on a couple of new-ish developments in reinforcement learning/game-playing AI models.

Meta (which, we always need to note, is the parent company of Facebook) apparently has an entire team of researchers devoted to training an AI system to play Diplomacy, a war-strategy board game. Unlike in chess or Go, a player in Diplomacy must collaborate with others to succeed. Meta’s program, named Cicero, has passed the bar, as explained in a Gizmodo article from November 2022.

“Players are constantly interacting with each other and each round begins with a series of pre-round negotiations. Crucially, Diplomacy players may attempt to deceive others and may also think the AI is lying. Researchers said Diplomacy is particularly challenging because it requires building trust with others, ‘in an environment that encourages players to not trust anyone,’” according to the article.

We can see the implications for collaborations between humans and AI outside of playing games — but I’m not in love with the idea that the researchers are helping Cicero learn how to gain trust while intentionally working to deceive humans. Of course, Cicero incorporates a large language model (R2C2, further trained on the WebDiplomacy dataset) for NLP tasks; see figures 2 and 3 in the Science article linked below. “Each message in the dialogue training dataset was annotated” to indicate its intent; the dataset contained “12,901,662 messages exchanged between players.”

Cicero was not identified as an AI construct while playing in online games with unsuspecting humans. It “apparently ‘passed as a human player,’ in 40 games of Diplomacy with 82 unique players.” It “ranked in the top 10% of players who played more than one game.”

Meanwhile, DeepMind was busy conquering another strategy board game, Stratego, with a new AI model named DeepNash. Unlike Diplomacy, Stratego is a two-player game, and unlike chess and Go, the value of each of your opponent’s pieces is unknown to you — you see where each piece is, but its identifying symbol faces away from you, like cards held close to the vest. DeepNash was trained on self-play (5.5 billion games) and does not search the game tree. Playing against humans online, it ascended to the rank of third among all Stratego players on the platform — after 50 matches.

Apparently the key to winning at Stratego is finding a Nash equilibrium, which I read about at Investopedia, which says: “There is not a specific formula to calculate Nash equilibrium. It can be determined by modeling out different scenarios within a given game to determine the payoff of each strategy and which would be the optimal strategy to choose.”

See: Mastering the game of Stratego with model-free multiagent reinforcement learning (Science, 2022).

See more posts about games at this site.

AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

One of the very best media items I’ve found is this feature-length documentary about the program that beat an international master at the game of Go in 2016. It’s excellent as a documentary film — well-paced, sparking curiosity, exciting in some parts, and never pedantic.

You don’t need to understand anything about the game (which is immensely popular in China, Japan, and Korea, but not widely played elsewhere). It’s explained visually so that you can appreciate what’s going on. The film is free to watch on YouTube.

As a resource for learning about AI — or, more specifically, about machine learning — the film excels at helping us understand the work of the team of humans that created and trained the AlphaGo program. We don’t see a lot of people sitting at computer keyboards, typing. There are clustered people pointing at a screen, talking enthusiastically, or saying, “What happened there? Why did it do that?”

Probably my favorite moment in the film is after Lee Se-dol, the human Go master, has played a move that is so great, it was later referred to as “the God move.” The AlphaGo team begins analyzing the program’s responses in real-time, watching the graphs of its probability calculations on a large screen in their command center. For all the talk of AI as a black box that makes decisions humans can”t comprehend, this scene demonstrates that AI can be made transparent and accountable.

There’s much, much more to love about this documentary. The director, Greg Kohs, had extraordinary access to the DeepMind team during the months leading up to the five-game match with Lee. In the end, Google financed a general-audience-friendly film. (Google acquired DeepMind in 2014.)

In an interview with CNET, Kohs said the film “had very modest beginnings.”

“A couple members of Google’s creative lab that I’d worked with before gave me a ring and said we’d have access behind the curtain with [DeepMind founder and CEO] Demis Hassabis and his team. So I jumped on board with the expectation we would just film what happens for archival purposes and then put it on a shelf on a hard drive and that would be the end of it.”
Greg Kohs

Another wonderful aspect of the film is its humanity. I’ve seen a fair number of “scare essays” that predict the end of everything as AI gains dominance over its creators — but here we hear a more nuanced and thought-provoking set of views and reactions.

First, there is Lee, possibly the best (human) Go player who has ever lived, in closeup, in the very moment of his realization that the machine has bested him. Then there are the other Go experts, who understand more than you or I what the machine has actually done. Finally, there are the team members of DeepMind, who built the machine. Of course they are happy, ecstatically happy — but they are humbled, and even awed, as well.

At the end of 2019, Lee Se-dol retired as a professional Go player, at age 36. He is the only human who has ever defeated AlphaGo in tournament play.

More about AlphaGo:

AlphaGo: The Story So Far (DeepMind)
In Two Moves, AlphaGo and Lee Sedol Redefined the Future (Wired, March 2016)
AlphaZero, the successor to AlphaGo (DeepMind, December 2018)

Tag: deepmind

AI researchers love playing games

AI programs that play games