Constrained Deep Reinforcement Learning for Energy Sustainable Multi-UAV
based Random Access IoT Networks with NOMA
release_lqc77sosqzaddbm5kcz377m2lm
by
Sami Khairy, Prasanna Balaprakash, Lin X. Cai, Yu Cheng
2020
Abstract
In this paper, we apply the Non-Orthogonal Multiple Access (NOMA) technique
to improve the massive channel access of a wireless IoT network where
solar-powered Unmanned Aerial Vehicles (UAVs) relay data from IoT devices to
remote servers. Specifically, IoT devices contend for accessing the shared
wireless channel using an adaptive p-persistent slotted Aloha protocol; and
the solar-powered UAVs adopt Successive Interference Cancellation (SIC) to
decode multiple received data from IoT devices to improve access efficiency. To
enable an energy-sustainable capacity-optimal network, we study the joint
problem of dynamic multi-UAV altitude control and multi-cell wireless channel
access management of IoT devices as a stochastic control problem with multiple
energy constraints. To learn an optimal control policy, we first formulate this
problem as a Constrained Markov Decision Process (CMDP), and propose an online
model-free Constrained Deep Reinforcement Learning (CDRL) algorithm based on
Lagrangian primal-dual policy optimization to solve the CMDP. Extensive
simulations demonstrate that our proposed algorithm learns a cooperative policy
among UAVs in which the altitude of UAVs and channel access probability of IoT
devices are dynamically and jointly controlled to attain the maximal long-term
network capacity while maintaining energy sustainability of UAVs. The proposed
algorithm outperforms Deep RL based solutions with reward shaping to account
for energy costs, and achieves a temporal average system capacity which is
82.4% higher than that of a feasible DRL based solution, and only 6.47%
lower compared to that of the energy-constraint-free system.
In text/plain
format
Archived Files and Locations
application/pdf 1.9 MB
file_qfc5bvxlo5gi3fovhacho3jt2e
|
arxiv.org (repository) web.archive.org (webarchive) |
2002.00073v1
access all versions, variants, and formats of this works (eg, pre-prints)