Deep Reinforcement Learning for Wireless Communications and NetworkingComprehensive guide to Deep Reinforcement Learning (DRL) as applied to wireless communication systems
Deep Reinforcement Learning for Wireless Communications and Networking presents an overview of the development of DRL while providing fundamental knowledge about theories, formulation, design, learning models, algorithms and implementation of DRL together with a particular case study to practice. The book also covers diverse applications of DRL to address various problems in wireless networks, such as caching, offloading, resource sharing, and security. The authors discuss open issues by introducing some advanced DRL approaches to address emerging issues in wireless communications and networking.
Covering new advanced models of DRL, e.g., deep dueling architecture and generative adversarial networks, as well as emerging problems considered in wireless networks, e.g., ambient backscatter communication, intelligent reflecting surfaces and edge intelligence, this is the first comprehensive book studying applications of DRL for wireless networks that presents the state-of-the-art research in architecture, protocol, and application design.
Deep Reinforcement Learning for Wireless Communications and Networking covers specific topics such as:Deep reinforcement learning models, covering deep learning, deep reinforcement learning, and models of deep reinforcement learningPhysical layer applications covering signal detection, decoding, and beamforming, power and rate control, and physical-layer securityMedium access control (MAC) layer applications, covering resource allocation, channel access, and user/cell associationNetwork layer applications, covering traffic routing, network classification, and network slicing
With comprehensive coverage of an exciting and noteworthy new technology,Deep Reinforcement Learning for Wireless Communications and Networking is an essential learning resource for researchers and communications engineers, along with developers and entrepreneurs in autonomous systems, who wish to harness this technology in practical applications.
Notes on Contributors xiii
Foreword xiv
Preface xv
Acknowledgments xviii
Acronyms xix
Introduction xxii
Part I Fundamentals of Deep Reinforcement Learning 1
1 Deep Reinforcement Learning and Its Applications 3
1.1 Wireless Networks and Emerging Challenges 3
1.2 Machine Learning Techniques and Development of DRL 4
1.2.1 Machine Learning 4
1.2.2 Artificial Neural Network 7
1.2.3 Convolutional Neural Network 8
1.2.4 Recurrent Neural Network 9
1.2.5 Development of Deep Reinforcement Learning 10
1.3 Potentials and Applications of DRL 11
1.3.1 Benefits of DRL in Human Lives 11
1.3.2 Features and Advantages of DRL Techniques 12
1.3.3 Academic Research Activities 12
1.3.4 Applications of DRL Techniques 13
1.3.5 Applications of DRL Techniques in Wireless Networks 15
1.4 Structure of this Book and Target Readership 16
1.4.1 Motivations and Structure of this Book 16
1.4.2 Target Readership 19
1.5 Chapter Summary 20
References 21
2 Markov Decision Process and Reinforcement Learning 25
2.1 Markov Decision Process 25
2.2 Partially Observable Markov Decision Process 26
2.3 Policy and Value Functions 29
2.4 Bellman Equations 30
2.5 Solutions of MDP Problems 31
2.5.1 Dynamic Programming 31
2.5.1.1 Policy Evaluation 31
2.5.1.2 Policy Improvement 31
2.5.1.3 Policy Iteration 31
2.5.2 Monte Carlo Sampling 32
2.6 Reinforcement Learning 33
2.7 Chapter Summary 35
References 35
3 Deep Reinforcement Learning Models and Techniques 37
3.1 Value-Based DRL Methods 37
3.1.1 Deep Q-Network 38
3.1.2 Double DQN 41
3.1.3 Prioritized Experience Replay 42
3.1.4 Dueling Network 44
3.2 Policy-Gradient Methods 45
3.2.1 REINFORCE Algorithm 46
3.2.1.1 Policy Gradient Estimation 46
3.2.1.2 Reducing the Variance 48
3.2.1.3 Policy Gradient Theorem 50
3.2.2 Actor-Critic Methods 51
3.2.3 Advantage of Actor-Critic Methods 52
3.2.3.1 Advantage of Actor-Critic (A2C) 53
3.2.3.2 Asynchronous Advantage Actor-Critic (A3C) 55
3.2.3.3 Generalized Advantage Estimate (GAE) 57
3.3 Deterministic Policy Gradient (DPG) 59
3.3.1 Deterministic Policy Gradient Theorem 59
3.3.2 Deep Deterministic Policy Gradient (DDPG) 61
3.3.3 Distributed Distributional DDPG (D4PG) 63
3.4 Natural Gradients 63
3.4.1 Principle of Natural Gradients 64
3.4.2 Trust Region Policy Optimization (TRPO) 67
3.4.2.1 Trust Region 69
3.4.2.2 Sample-Based Formulation 70
3.4.2.3 Practical Implementation 70
3.4.3 Proximal Policy Optimization (PPO) 72
3.5 Model-Based RL 74
3.5.1 Vanilla Model-Based RL 75
3.5.2 Robust Model-Based RL: Model-Ensemble TRPO (ME-TRPO) 76
3.5.3 Adaptive Model-Based RL: Model-Based Meta-Policy Optimization (mb-mpo) 77
3.6 Chapter Summary 78
References 79
4 A Case Study and Detailed Implementation 83
4.1 System Model and Problem Formulation 83
4.1.1 System Model and Assumptions 84
4.1.1.1 Jamming Model 84
4.1.1.2 System Operation 85
4.1.2 Problem Formulation 86
4.1.2.1 State Space 86
4.1.2.2 Action Space 87
4.1.2.3 Immediate Reward 88
4.1.2.4 Optimization Formulation 88
4.2 Implementation and Environment Settings 89
4.2.1 Install TensorFlow with Anaconda 89
4.2.2 Q-Learning 90
4.2.2.1 Codes for the Environment 91
4.2.2.2 Codes for the Agent 96
4.2.3 Deep Q-Learning 97
4.3 Simulation Results and Performance Analysis 102
4.4 Chapter Summary 106
References 106
Part II Applications of Drl in Wireless Communications and Networking 109
5 DRL at the Physical Layer 111
5.1 Beamforming, Signal Detection, and Decoding 111
5.1.1 Beamforming 111
5.1.1.1 Beamforming Optimization Problem 111
5.1.1.2 DRL-Based Beamforming 113
5.1.2 Signal Detection and Channel Estimation 118
5.1.2.1 Signal Detection and Channel Estimation Problem 118
5.1.2.2 RL-Based Approaches 120
5.1.3 Channel Decoding 122
5.2 Power and Rate Control 123
5.2.1 Power and Rate Control Problem 123
5.2.2 DRL-Based Power and Rate Control 124
5.3 Physical-Layer Security 128
5.4 Chapter Summary 129
References 131
6 DRL at the MAC Layer 137
6.1 Resource Management and Optimization 137
6.2 Channel Access Control 139
6.2.1 DRL in the IEEE 802.11 MAC 141
6.2.2 MAC for Massive Access in IoT 143
6.2.3 MAC for 5G and B5G Cellular Systems 147
6.3 Heterogeneous MAC Protocols 155
6.4 Chapter Summary 158
References 158
7 DRL at the Network Layer 163
7.1 Traffic Routing 163
7.2 Network Slicing 166
7.2.1 Network Slicing-Based Architecture 166
7.2.2 Applications of DRL in Network Slicing 168
7.3 Network Intrusion Detection 179
7.3.1 Host-Based IDS 180
7.3.2 Network-Based IDS 181
7.4 Chapter Summary 183
References 183
8 DRL at the Application and Service Layer 187
8.1 Content Caching 187
8.1.1 QoS-Aware Caching 187
8.1.2 Joint Caching and Transmission Control 189
8.1.3 Joint Caching, Networking, and Computation 191
8.2 Data and Computation Offloading 193
8.3 Data Processing and Analytics 198
8.3.1 Data Organization 198
8.3.1.1 Data Partitioning 198
8.3.1.2 Data Compression 199
8.3.2 Data Scheduling 200
8.3.3 Tuning of Data Processing Systems 201
8.3.4 Data Indexing 202
8.3.4.1 Database Index Selection 202
8.3.4.2 Index Structure Construction 203
8.3.5 Query Optimization 205
8.4 Chapter Summary 206
References 207
Part III Challenges, Approaches, Open Issues, and Emerging Research Topics 213
9 DRL Challenges in Wireless Networks 215
9.1 Adversarial Attacks on DRL 215
9.1.1 Attacks Perturbing the State space 215
9.1.1.1 Manipulation of Observations 216
9.1.1.2 Manipulation of Training Data 218
9.1.2 Attacks Perturbing the Reward Function 220
9.1.3 Attacks Perturbing the Action Space 222
9.2 Multiagent DRL in Dynamic Environments 223
9.2.1 Motivations 223
9.2.2 Multiagent Reinforcement Learning Models 224
9.2.2.1 Markov/Stochastic Games 225
9.2.2.2 Decentralized Partially Observable Markov Decision Process (dpomdp) 226
9.2.3 Applications of Multiagent DRL in Wireless Networks 227
9.2.4 Challenges of Using Multiagent DRL in Wireless Networks 229
9.2.4.1 Nonstationarity Issue 229
9.2.4.2 Partial Observability Issue 229
9.3 Other Challenges 230
9.3.1 Inherent Problems of Using RL in Real-Word Systems 230
9.3.1.1 Limited Learning Samples 230
9.3.1.2 System Delays 230
9.3.1.3 High-Dimensional State and Action Spaces 231
9.3.1.4 System and Environment Constraints 231
9.3.1.5 Partial Observability and Nonstationarity 231
9.3.1.6 Multiobjective Reward Functions 232
9.3.2 Inherent Problems of DL and Beyond 232
9.3.2.1 Inherent Problems of dl 232
9.3.2.2 Challenges of DRL Beyond Deep Learning 233
9.3.3 Implementation of DL Models in Wireless Devices 236
9.4 Chapter Summary 237
References 237
10 DRL and Emerging Topics in Wireless Networks 241
10.1 DRL for Emerging Problems in Future Wireless Networks 241
10.1.1 Joint Radar and Data Communications 241
10.1.2 Ambient Backscatter Communications 244
10.1.3 Reconfigurable Intelligent Surface-Aided Communications 247
10.1.4 Rate Splitting Communications 249
10.2 Advanced DRL Models 252
10.2.1 Deep Reinforcement Transfer Learning 252
10.2.1.1 Reward Shaping 253
10.2.1.2 Intertask Mapping 254
10.2.1.3 Learning from Demonstrations 255
10.2.1.4 Policy Transfer 255
10.2.1.5 Reusing Representations 256
10.2.2 Generative Adversarial Network (GAN) for DRL 257
10.2.3 Meta Reinforcement Learning 258
10.3 Chapter Summary 259
References 259
Index 263