General Formulation and PCL-Analysis for Restless Bandits with Limited Observability

arXiv:2307.03034v4 Announce Type: replace-cross
Abstract: In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player is based on the past observation history that is limited (partial) and error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of the observation process, we formulate the problem as a restless bandit with an infinite high-dimensional belief state space. We apply the achievable region method with partial conservation law (PCL) to the infinite-state problem and analyze its indexability and priority index (Whittle index). Finally, we propose an approximation process to transform the problem into which the AG algorithm of Ni\~no-Mora (2001) for finite-state problems can be applied. Numerical experiments show that our algorithm has excellent performance.