Multi-agent reinforcement learning has received increased attention in cooperative games. However, research in non-cooperative games is lagging behind. Independent value-based learning algorithms have demonstrated simplicity and versatility in various contexts. In this paper, we study the behavior of these algorithms in non-cooperative settings. We explain the conditions that a game must satisfy for the algorithms to work. We further test the algorithms in our proposed game Food Chain that simulates an ecosystem. Our results show that independent value-based learning algorithms can converge to Nash equilibrium, only when the Nash equilibrium consists of uniformly random policies over the feasible actions.
Source: https://www.tandfonline.com/doi/abs/10.1080/17445760.2025.2601000