Home  /  Algorithms  /  Vol: 17 Par: 1 (2024)  /  Article
ARTICLE
TITLE

Distributed Data-Driven Learning-Based Optimal Dynamic Resource Allocation for Multi-RIS-Assisted Multi-User Ad-Hoc Network

SUMMARY

This study investigates the problem of decentralized dynamic resource allocation optimization for ad-hoc network communication with the support of reconfigurable intelligent surfaces (RIS), leveraging a reinforcement learning framework. In the present context of cellular networks, device-to-device (D2D) communication stands out as a promising technique to enhance the spectrum efficiency. Simultaneously, RIS have gained considerable attention due to their ability to enhance the quality of dynamic wireless networks by maximizing the spectrum efficiency without increasing the power consumption. However, prevalent centralized D2D transmission schemes require global information, leading to a significant signaling overhead. Conversely, existing distributed schemes, while avoiding the need for global information, often demand frequent information exchange among D2D users, falling short of achieving global optimization. This paper introduces a framework comprising an outer loop and inner loop. In the outer loop, decentralized dynamic resource allocation optimization has been developed for self-organizing network communication aided by RIS. This is accomplished through the application of a multi-player multi-armed bandit approach, completing strategies for RIS and resource block selection. Notably, these strategies operate without requiring signal interaction during execution. Meanwhile, in the inner loop, the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm has been adopted for cooperative learning with neural networks (NNs) to obtain optimal transmit power control and RIS phase shift control for multiple users, with a specified RIS and resource block selection policy from the outer loop. Through the utilization of optimization theory, distributed optimal resource allocation can be attained as the outer and inner reinforcement learning algorithms converge over time. Finally, a series of numerical simulations are presented to validate and illustrate the effectiveness of the proposed scheme.

 Articles related

Jayme L Waldron,Shane M Welch,John Holloway    

Negative human wildlife interactions are often contingent on a species’ spatial ecology and habitat use. Interaction outcomes can contribute to species imperilment and are exacerbated when the wildlife species suffers human persecution. Wildlife species ... see more


O. Pokorná,D. Mocková    

A typical feature of transport infrastructure projects is that they are expensive and take a long time to construct. Transport infrastructure financing has traditionally lain in the public domain. A tightening of many countries' budgets in recent times h... see more


Risna Erni Yati Adu,Marselina Theresia Djue Tea,Gebhardus Djugian Gelyaman,Jefry Presson    

The lack of knowledge and skills of the weaver community in Amol Village, North Central Timor Regency to prepare natural dye from natural resources that are difficult to be found in dry season is the main reason to conduct this service activity. This tra... see more


Francesca Gambetti,Salvatore Magazù    

Is it useful for a physicist to read Parmenides’s Poem and more generally philosophy works? In teaching physics, which kind of benefits could come from the study of Eleatic ontology? And vice versa, which kind of usefulness could a historian of... see more


Armen Armen    

The purpose of this research is to tilapia fish farming options to address the population's dependence on natural resources Kerinci National Park (TNKS) at Nagari Limes Tower Lumpo South Coastal District. The study involved people with the characteristic... see more