Deviations From the Nash Equilibrium and Emergence of Tacit Collusion in a Two-player Optimal Execution Game With Reinforcement Learning
A theoretical problem that has real-world implications. A two-player optimal execution game deviate from Nash equilibrium, exhibiting tacit collusion and aligning with Pareto-optimal solution.
Paper Metadata
Publication Date: 2024-09-20
Source: SSRN
Link: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4932871
Authors:
Andrea Macrì, Scuola Normale Superiore
Fabrizio Lillo, Università di Bologna
Related reading: https://oxford-man.ox.ac.uk/alvaros-podcast-interview-with-risk-net-collusion-within-trading-algos/
Keywords
Game Theory
Optimal Execution
Algo Collusion
Reinforcement Learning
Nash equilibrium
Notes for Review
Recommendation: 95%
Reinforcement Learning in Financial Trading: A Study on Tacit Collusion. This paper explores the emergence of tacit collusion in a market where two autonomous agents, driven by deep RL, engage in an optimal execution task. The authors use the Almgren-Chriss framework to model market impact and Double Deep Q-Learning to train the agents. The results show that the learned strategies deviate significantly from the Nash equilibrium, exhibiting tacit collusion and aligning with the Pareto-optimal solution. The study also examines the robustness of these strategies under different volatility regimes, finding that they remain collusive even when applied to different conditions. Key findings include: (1) RL agents can naturally converge to a collusive equilibrium without explicit training, (2) the learned strategies are robust across different volatility regimes, and (3) the emergence of tacit collusion can lead to unintended market manipulation. Implications for market stability and fairness are significant, highlighting the need for further research on the interaction between RL agents and market dynamics.
This research on market impact games has several practical implications for real trading and helps us understand market dynamics in a more nuanced way. By analyzing the interactions between multiple agents and assets, we can develop more effective strategies for executing trades, minimizing market impact, and reducing costs. This knowledge can also inform the design of more efficient and resilient market structures, as well as provide regulators with valuable information on how to mitigate the risks associated with advanced market dynamics. Furthermore, our findings shed light on the complex interactions between market participants, demonstrating that market behavior is not solely driven by fundamental factors, but also by the intricate web of relationships between agents (see the recommended reading link, Algo Collusion could already took place unintentionally). By studying market impact games, we can gain insights into how market phenomena, such as price instabilities and market manipulation practices, emerge from the interactions between individual agents, ultimately promoting a deeper understanding of the complex and dynamic nature of financial markets.
Abstract
The use of reinforcement learning algorithms in financial trading is becoming increasingly prevalent. However, the autonomous nature of these algorithms can lead to unexpected outcomes that deviate from traditional game-theoretical predictions and may even destabilize markets. In this study, we examine a scenario in which two autonomous agents, modeled with Double Deep Q-Learning, learn to liquidate the same asset optimally in the presence of market impact, using the Almgren-Chriss (2000) framework. Our results show that the strategies learned by the agents deviate significantly from the Nash equilibrium of the corresponding market impact game. Notably, the learned strategies exhibit tacit collusion, closely aligning with the Pareto-optimal solution. We further explore how different levels of market volatility influence the agents' performance and the equilibria they discover, including scenarios where volatility differs between the training and testing phases.