| Téléchargement | - Voir la version finale : Constrained restless bandits for dynamic scheduling in cyber-physical systems (PDF, 3.9 Mio)
|
|---|
| DOI | Trouver le DOI : https://doi.org/10.1109/ACCESS.2024.3510558 |
|---|
| Auteur | Rechercher : Kaza, Kesav RamIdentifiant ORCID : https://orcid.org/0000-0002-9051-4624; Rechercher : Meshram, Rahul H.; Rechercher : Mehta, Varunkumar1Identifiant ORCID : https://orcid.org/0000-0002-5087-8175; Rechercher : Merchant, Shabbir N.Identifiant ORCID : https://orcid.org/0000-0002-9119-6795 |
|---|
| Affiliation | - Conseil national de recherches du Canada. Aérospatiale
|
|---|
| Bailleur de fonds | Rechercher : Indian Institute of Technology Madras; Rechercher : Science and Engineering Research Board |
|---|
| Format | Texte, Article |
|---|
| Sujet | Markov decision processes; restless multi-armed bandits; dynamic scheduling; cyber-physical systems; reinforcement learning; sequential decision models; stochastic control; indexes; relays; maintenance; stochastic processes; sensor systems; process control; monitoring; decision making |
|---|
| Résumé | This paper develops a sequential decision-making framework called constrained restless multi-armed bandits (CRMABs) to model problems of resource allocation under uncertainty and dynamic availability constraints. The decision-maker’s objective is to maximize the long-term cumulative reward. This can only be achieved by considering the impact of current actions on the future evolution of states. The uncertainty about the future availability of arms and partial state-information makes this objective challenging. CRMABs can be applied to resource allocation problems in cyber-physical systems, including sensor/relay scheduling. Whittle’s index policy, online rollout, and myopic policies are studied as solutions for CRMABs. First, the conditions for the applicability of Whittle’s index policy are studied, and the indexability result is claimed under some structural assumptions. An algorithm for index computation is presented. The online rollout policy for partially observable CRMABs is proposed as a low-complexity alternative to the index policy, and the complexity of these schemes is analyzed. An upper bound on the optimal value function is derived, which helps assess the sub-optimality of various solutions. The simulation study compares the performance of these policies and shows that the rollout policy is the better performing solution. In some settings it shows about 14% gain relative to Whittle’s index and myopic policies. Finally, an application to the problem of wildfire management is presented. Decision-making using CRMABs is analyzed from the perspective of a central agency tasked with fighting wildfires in multiple regions under logistic constraints. |
|---|
| Date de publication | 2024-12-02 |
|---|
| Maison d’édition | IEEE |
|---|
| Licence | |
|---|
| Dans | |
|---|
| Langue | anglais |
|---|
| Publications évaluées par des pairs | Oui |
|---|
| Exporter la notice | Exporter en format RIS |
|---|
| Signaler une correction | Signaler une correction (s'ouvre dans un nouvel onglet) |
|---|
| Identificateur de l’enregistrement | 2ff6f915-4c05-4073-8dfa-e17a5e464c43 |
|---|
| Enregistrement créé | 2025-09-09 |
|---|
| Enregistrement modifié | 2025-09-10 |
|---|