Nash Q-Network for Multi-Agent Cybersecurity Simulation

Cybersecurity defense is inherently adversarial, making multi‑agent reinforcement learning a natural fit, but simultaneous training of competing agents in complex environments is notoriously unstable. This work proposes a game‑theoretic deep RL framework for CybORG that extends Nash Q‑learning with a centralized joint Q‑network (critic) and separate decentralized policies. The critic estimates joint state–action values to construct payoff matrices and compute Nash equilibria, while Blue and Red policies are trained by minimizing cross‑entropy to these equilibrium strategies under partial observability. By decoupling critic learning from policy updates, the method mitigates non‑stationarity and guides agents—especially the Blue defender—toward robust, equilibrium‑based behaviors against an adaptive attacker.

Nash Q-Network for Multi-agent Cybersecurity Simulation

Cite

@InProceedings{10.1007/978-3-032-08067-7_3,
author=”Xie, Qintong
and Koh, Edward
and Cadet, Xavier
and Chin, Peter”,
editor=”Baras, John S.
and Papavassiliou, Symeon
and Tsiropoulou, Eirini Eleni
and Sayin, Muhammed O.”,
title=”Nash Q-Network for Multi-agent Cybersecurity Simulation”,
booktitle=”Game Theory and AI for Security”,
year=”2026″,
publisher=”Springer Nature Switzerland”,
address=”Cham”,
pages=”43–60″,
abstract=”Cybersecurity defense involves interactions between adversarial parties (namely defenders and hackers), making multi-agent reinforcement learning (MARL) an ideal approach for modeling and learning strategies for these scenarios. This paper addresses the challenge of simultaneous multi-agent training in complex environments and introduces a Nash Q-Network that enables learning in a partial observation environment. Facilitates learning in partially observed settings. We demonstrate the successful implementation of this algorithm in a notable complex cyber defense simulation treated as a two-player zero-sum Markov game setting. We propose the Nash Q-Network, which aims to learn Nash-optimal strategies that translate to robust defenses in cybersecurity settings. Our approach incorporates aspects of proximal policy optimization (PPO), deep Q-network (DQN), and the Nash-Q algorithm, addressing common challenges like non-stationarity and instability in multi-agent learning. The training process employs distributed data collection and carefully designed neural architectures for both agents and critics.”,
isbn=”978-3-032-08067-7″
}

Explore Reinforced: Equilibrium Approximation with Reinforcement Learning

Mateusz Nowak, Qintong Xie, Emma Graham, Ryan Yu, Michelle Yilin Feng, Roy Leibovitz, Xavier Cadet, Sang (Peter) Chin

Cite

@InProceedings{10.1007/978-3-032-08064-6_3,
author=”Nowak, Mateusz
and Xie, Qintong
and Graham, Emma
and Yu, Ryan
and Feng, Michelle Yilin
and Leibovitz, Roy
and Cadet, Xavier
and Chin, Peter”,
editor=”Baras, John S.
and Papavassiliou, Symeon
and Tsiropoulou, Eirini Eleni
and Sayin, Muhammed O.”,
title=”Explore Reinforced: Equilibrium Approximation with Reinforcement Learning”,
booktitle=”Game Theory and AI for Security”,
year=”2026″,
publisher=”Springer Nature Switzerland”,
address=”Cham”,
pages=”42–60″,
abstract=”Current approximate Coarse Correlated Equilibria (CCE) algorithms struggle with equilibrium approximation for games in large stochastic environments. While these game-theoretic methods are theoretically guaranteed to converge to a strong solution concept, reinforcement learning (RL) algorithms have shown increasing capability in such environments but lack the equilibrium guarantees provided by game-theoretic approaches. In this paper, we introduce Exp3-IXRL – an equilibrium approximator that utilizes RL, specifically leveraging the agent’s action selection, to update equilibrium approximations while preserving the integrity of both learning processes. We therefore extend the Exp3 algorithms beyond the stateless, non-stochastic settings. Empirically, we demonstrate improved performance in classic non-stochastic multi-armed bandit settings, capability in stochastic multi-armed bandits, and strong results in a complex and adversarial cybersecurity network environment.”,
isbn=”978-3-032-08064-6″
}

Tree Search for Simultaneous Move Games via Equilibrium Approximation

Ryan Yu, Alex Olshevsky, Sang (Peter) Chin

Cite

@InProceedings{10.1007/978-3-032-08064-6_1,
author=”Yu, Ryan
and Olshevsky, Alex
and Chin, Peter”,
editor=”Baras, John S.
and Papavassiliou, Symeon
and Tsiropoulou, Eirini Eleni
and Sayin, Muhammed O.”,
title=”Tree Search for Simultaneous Move Games via Equilibrium Approximation”,
booktitle=”Game Theory and AI for Security”,
year=”2026″,
publisher=”Springer Nature Switzerland”,
address=”Cham”,
pages=”3–22″,
abstract=”Neural network supported tree-search has shown strong results in a variety of perfect information multi-agent tasks. However, the performance of these methods on imperfect information games has generally been below competing approaches. Here we study the class of simultaneous-move games, which are a subclass of imperfect information games which are most similar to perfect information games: both agents know the game state with the exception of the opponent’s move, which is revealed only after each agent makes its own move. Simultaneous move games include popular benchmarks such as Google Research Football and Starcraft Multi Agent Challenge. Our goal in this paper is to take tree search algorithms trained through self-play and adapt them to simultaneous move games without significant loss of performance. While naive ways to do this fail, we are able to achieve this by deriving a practical method that attempts to approximate a coarse correlated equilibrium as a subroutine within a tree search. Our algorithm, Neural Network-Coarse Correlated Equilibrium (NN-CCE), works on cooperative, competitive, and mixed tasks and our results are better than the current best MARL algorithms on a wide range of accepted baselines.”,
isbn=”978-3-032-08064-6″
}

nFlip : Deep Reinforcement Learning in Multiplayer FlipIt

Reinforcement learning has shown much success in games such as chess, backgammon and Go. However, in most of these games, agents have full knowledge of the environment at all times. We describe a deep learning model that successfully maximizes its score using reinforcement learning in a game with incomplete and imperfect information. We apply our model to FlipIt 1, a two-player game in which both players, the attacker and the defender, compete for ownership of a shared resource and only receive information on the current state upon making a move. Our model is a deep neural network combined with Q-learning and is trained to maximize the defender’s time of ownership of the resource. We extend FlipIt to a larger action-spaced game with the introduction of a new lower-cost move and generalize the model to multiplayer FlipIt.

Referring Expression Problem

The problem of referring expression is a more domain specific area of image captioning with the goal of describing a sub-region of a given image. Rational Speech Act (RSA) framework is a probabilistic reasoning approach that can generate sentences based on game theory systems of speaker – listener. The advantage of RSA is its explainability – namely answer the question of why a speaking agent choosing a specific word/phrase over another. Can RSA be applied to referring expression problem to generate a better/more explainable description?

Using Game Theory and Reinforcement Learning to Predict the Future

Baseball is a well known, repeated, finite, adversarial, stochastic game that has a massive amount of available data. On the other hand, Reinforcement Learning (RL) models take significant time and resources to train. By fusing Game Theory and RL, we are answering interesting questions such as “given a video of a pitch, can we compute the utility of a pitch given the desired location, resulting location, and setting?”

Collusion Detection in Team-Based Multiplayer Games

Publication:

Cite

@misc{greige_collusion_2022,
abstract = {In the context of competitive multiplayer games, collusion happens when two or more teams decide to collaborate towards a common goal, with the intention of gaining an unfair advantage from this cooperation. The task of identifying colluders from the player population is however infeasible to game designers due to the sheer size of the player population. In this paper, we propose a system that detects colluding behaviors in team-based multiplayer games and highlights the players that most likely exhibit colluding behaviors. The game designers then proceed to analyze a smaller subset of players and decide what action to take. For this reason, it is important and necessary to be extremely careful with false positives when automating the detection. The proposed method analyzes the players’ social relationships paired with their in-game behavioral patterns and, using tools from graph theory, infers a feature set that allows us to detect and measure the degree of collusion exhibited by each pair of players from opposing teams. We then automate the detection using Isolation Forest, an unsupervised learning technique specialized in highlighting outliers, and show the performance and efficiency of our approach on two real datasets, each with over 170,000 unique players and over 100,000 different matches.},
annote = {Comment: 14 pages, 4 figures},
author = {Greige, Laura and Silva, Fernando De Mesentier and Trotter, Meredith and Lawrence, Chris and Chin, Peter and Varadarajan, Dilip},
keywords = {Computer Science – Machine Learning, Computer Science – Computer Science and Game Theory},
month = {March},
note = {arXiv:2203.05121 [cs]},
publisher = {arXiv},
title = {Collusion Detection in Team-Based Multiplayer Games},
url = {http://arxiv.org/abs/2203.05121},
urldate = {2022-08-06},
year = {2022}
}

Application of Game Theory to Sensor Resource Management

Cite

@article{chin_application_2012,
author = {CHIN, SANG},
journal = {Johns Hopkins APL technical digest},
note = {Publisher: Johns Hopkins University Applied Physics Laboratory},
number = {2},
pages = {107–114},
title = {Application of Game Theory to Sensor Resource Management},
volume = {31},
year = {2012}
}

Game-theoretic homological sensor resource management for SSA

Cite

@inproceedings{chin_game-theoretic_2009,
abstract = {We present a game-theoretic approach to Level 2/3/4 fusion for the purpose of Space Situational Awareness (SSA) along with prototypical SW implementation of this approach to demonstrate its effectiveness for possible future space operations. Our approach is based upon innovative techniques that we are developing to solve dynamic games and Nperson cooperative/non-cooperative games, as well as a new emerging homological sensing algorithms which we apply to control disparate network of space sensors in order to gain better SSA.},
author = {Chin, Sang Peter},
booktitle = {Sensors and Systems for Space Applications III},
doi = {10.1117/12.818191},
editor = {Cox, Joseph L. and Motaghedi, Pejmun},
keywords = {Game Theory, High-level data fusion, Homology},
note = {Backup Publisher: International Society for Optics and Photonics},
pages = {188 — 198},
publisher = {SPIE},
title = {Game-theoretic homological sensor resource management for SSA},
url = {https://doi.org/10.1117/12.818191},
volume = {7330},
year = {2009}
}