基于深度强化学习方法的无线多跳网络能量高效机会路由

doi:10.7523/j.ucas.2020.0035

中国科学院大学学报 ›› 2022, Vol. 39 ›› Issue (3): 410-420.DOI: 10.7523/j.ucas.2020.0035

• 电子信息与计算机科学 • 上一篇下一篇

基于深度强化学习方法的无线多跳网络能量高效机会路由

靳晓晗, 岩延, 张宝贤

中国科学院大学泛在与传感网研究中心, 北京 100049

收稿日期:2020-04-20 修回日期:2020-05-12 发布日期:2021-06-01
通讯作者: 靳晓晗
基金资助:
Supported by National Natural Science Foundation of China (61872331) and the Fundamental Research Funds for the Central Universities

Energy efficient opportunistic routing for wireless multihop networks: a deep reinforcement learning approach

JIN Xiaohan, YAN Yan, ZHANG Baoxian

Research Center of Ubiquitous Sensor Networks, University of Chinese Academy of Sciences, Beijing 100049, China

Received:2020-04-20 Revised:2020-05-12 Published:2021-06-01
Supported by:
Supported by National Natural Science Foundation of China (61872331) and the Fundamental Research Funds for the Central Universities

摘要/Abstract

摘要： 由于机会路由能够利用无线信道的广播特性和有损特性,因此一直是提高无线网络路由性能的一个很有效的途径。提出一种基于深度强化学习的无线多跳网络能量高效机会路由算法,该算法使得智能体能够通过训练学习最优的路由策略,以通过机会路由的方式减少传输时间,同时平衡能耗延长网络寿命。此外,本算法还可以极大地缓解冷启动问题并获得较好的初始性能。仿真结果表明,与现有算法相比,该算法具有更好的性能。

关键词: 深度强化学习, 无线多跳网络, 机会路由

Abstract: Opportunistic routing has been an efficient approach for improving the performance of wireless multihop networks due to its salient features to take advantage of the broadcast and lossy nature of wireless channels. In this paper, we propose a deep reinforcement learning based energy efficient opportunistic routing algorithm for wireless multihop networks, which enables a learning agent to train and learn optimized routing policy to reduce the transmission time while balancing the energy consumption to extend the life of the network in an opportunistic way. Furthermore, the proposed algorithm can significantly alleviate the cold start problem and achieve better initial performance. Simulation results demonstrate that the proposed algorithm yield better performance as compared with existing algorithms.

Key words: deep reinforcement learning, wireless multihop networks, opportunistic routing

中图分类号:

TN929.5

靳晓晗, 岩延, 张宝贤. 基于深度强化学习方法的无线多跳网络能量高效机会路由[J]. 中国科学院大学学报, 2022, 39(3): 410-420.

JIN Xiaohan, YAN Yan, ZHANG Baoxian. Energy efficient opportunistic routing for wireless multihop networks: a deep reinforcement learning approach[J]. Journal of University of Chinese Academy of Sciences, 2022, 39(3): 410-420.

参考文献

[1] Biswas S, Morris R. Opportunistic routing in multi-hop wireless networks[J]. ACM SIGCOMM Computer Communication Review, 2004, 34(1): 69-74. DOI:10.1145/972374.972387.
[2] Chachulski S, Jennings M, Katti S, et al. Trading structure for randomness in wireless opportunistic routing[J]. ACM SIGCOMM Computer Communication Review, 2007, 37(4): 169-180. DOI:10.1145/1282427.1282400.
[3] Zorzi M, Rao R R. Geographic random forwarding (GeRaF) for ad hoc and sensor networks: energy and latency performance[J]. IEEE Transactions on Mobile Computing, 2003, 2(4): 349-365. DOI:10.1109/TMC.2003.1255650.
[4] Chu M, Li H, Liao X, et al. Reinforcement learning-based multiaccess control and battery prediction with energy harvesting in IoT systems[J]. IEEE Internet of Things Journal, 2019, 6(2): 2009-2020. DOI:10.1109/JIOT.2018.2872440.
[5] Sutton R S, Barto A G. Reinforcement learning: an introduction[M]. Cambridge, MA, USA: MIT press, 2018.
[6] Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489. DOI:10.1038/nature16961.
[7] Mirowski P, Pascanu R, Viola F, et al. Learning to navigate in complex environments[EB/OL]. ArXiv Preprint, 2016: 1611.03673. (2017-01-13)[2020-04-18]. http://arxiv.org/abs/1611.03673.
[8] He D, Xia Y C, Qin T, et al. Dual learning for machine translation[C]//Advances in Neural Information Processing Systems, 2016: 820-828.
[9] Mammeri Z. Reinforcement learning based routing in networks: review and classification of approaches[J]. IEEE Access, 2019, 7: 55916-55950. DOI:10.1109/ACCESS.2019.2913776.
[10] Perkins C E, Bhagwat P. Highly dynamic destination-sequenced distance-vector routing (DSDV) for mobile computers[J]. ACM SIGCOMM Computer Communication Review, 1994, 24(4): 234-244. DOI:10.1145/190809.190336.
[11] Jacquet P, Muhlethaler P, Clausen T, et al. Optimized link state routing protocol for ad hoc networks[C]//Proceedings of IEEE International Multi Topic Conference (IEEE INMIC 2001). Technology for the 21st Century. December 30, 2001, Lahore, Pakistan. IEEE, 2001: 62-68. DOI:10.1109/INMIC.2001.995315.
[12] Perkins C E, Royer E M. Ad-hoc on-demand distance vector routing[C]//Proceedings WMCSA'99. Second IEEE Workshop on Mobile Computing Systems and Applications. February 25-26, 1999, New Orleans, LA, USA. IEEE, 1999: 90-100. DOI:10.1109/MCSA.1999.749281.
[13] Park V D, Corson M S. A highly adaptive distributed routing algorithm for mobile wireless networks[C]//Proceedings of INFOCOM'97. April 7-11, 1997, Kobe, Japan. IEEE, 1997, 3: 1405-1413. DOI:10.1109/INFCOM.1997.631180.
[14] Youssef M, Ibrahim M, Abdelatif M, et al. Routing metrics of cognitive radio networks: a survey[J]. IEEE Communications Surveys & Tutorials, 2014, 16(1): 92-109. DOI:10.1109/SURV.2013.082713.00184.
[15] Boyan J, Littman M. Packet routing in dynamically changing networks: a reinforcement learning approach[C]//Advances in Neural Information Processing Systems, 1994: 671-678.
[16] Watkins C J C H, Dayan P. Q-learning[J]. Machine learning, 1992, 8(3/4): 279-292. DOI:10.1007/BF00992698.
[17] Choi S P M, Yeung D Y. Predictive Q-routing: a memory-based reinforcement learning approach to adaptive traffic control[C]//Advances in Neural Information Processing Systems. 1996: 945-951.
[18] Kumar S, Miikkulainen R. Dual reinforcement Q-routing: an on-line adaptive routing algorithm[C]//Proceedings of the Artificial Neural Networks in Engineering Conference, 1997: 231-238.
[19] Tang K X, Li C L, Xiong H K, et al. Reinforcement learning-based opportunistic routing for live video streaming over multi-hop wireless networks[C]//2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP). October 16-18, 2017, Luton, UK. IEEE, 2017: 1-6. DOI:10.1109/MMSP.2017.8122255.
[20] Liu Y, Tong K F, Wong K K. Reinforcement learning based routing for energy sensitive wireless mesh IoT networks[J]. Electronics Letters, 2019, 55(17): 966-968. DOI:10.1049/el.2019.1864.
[21] Zhao X D, Yang H J, Zong G D. Adaptive neural hierarchical sliding mode control of nonstrict-feedback nonlinear systems and an application to electronic circuits[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2017, 47(7): 1394-1404. DOI:10.1109/TSMC.2016.2613885.
[22] Luong N C, Hoang D T, Gong S M, et al. Applications of deep reinforcement learning in communications and networking: a survey[J]. IEEE Communications Surveys & Tutorials, 2019, 21(4): 3133-3174. DOI:10.1109/COMST.2019.2916583.
[23] Mukhutdinov D, Filchenkov A, Shalyto A, et al. Multi-agent deep learning for simultaneous optimization for time and energy in distributed routing system[J]. Future Generation Computer Systems, 2019, 94: 587-600. DOI:10.1016/j.future.2018.12.037.
[24] Valadarsky A, Schapira M, Shahaf D, et al. A machine learning approach to routing[EB/OL]. ArXiv Preprint, 2017: 1708.03074. (2017-11-11)[2020-04-18].http:// arxiv.org/abs/1708.03074.
[25] Stampa G, Arias M, Sánchez-Charles D, et al. A deep-reinforcement learning approach for software-defined networking routing optimization[EB/OL]. ArXiv Preprint, 2017: 1709.07080. (2017-09-20)[2020-04-18].http:// arxiv.org/abs/1709.07080.
[26] de Couto D S J, Aguayo D, Bicket J, et al. A high-throughput path metric for multi-hop wireless routing[C]//Proceedings of the 9th Annual International Conference on Mobile Computing and Networking. 2003: 134-146. DOI:10.1145/938985.939000.
[27] Cormen T H, Leiserson C E, Rivest R L, et al. Introduction to algorithms[M]. Cambridge, MA, USA: MIT Press, 2009.
[28] Jain A, Nandakumar K, Ross A. Score normalization in multimodal biometric systems[J]. Pattern Recognition, 2005, 38(12): 2270-2285. DOI:10.1016/j.patcog.2005.01.012.
[29] Wang Z, Crowcroft J. Quality-of-service routing for supporting multimedia applications[J]. IEEE Journal on Selected Areas in Communications, 1996, 14(7): 1228-1234. DOI:10.1109/49.536364.
[30] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. DOI:10.1038/nature14236.
[31] Hester T, Vecerik M, Pietquin O, et al. Deep Q-learning from demonstrations [C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018,32(1):3223-3230.
[32] Hagberg A, Swart P, S Chult D. Exploring network structure, dynamics, and function using NetworkX[C]//Proceedings of the 7th Python in Science Conference (SciPy). 2008: 11-15.
[33] Du Y H, Xu Y, Xue L, et al. An energy-efficient cross-layer routing protocol for cognitive radio networks using apprenticeship deep reinforcement learning[J]. Energies, 2019, 12(14): 2829. DOI:10.3390/en1214829.

基于深度强化学习方法的无线多跳网络能量高效机会路由

Energy efficient opportunistic routing for wireless multihop networks: a deep reinforcement learning approach

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 2

编辑推荐

Metrics

本文评价

访问统计

联系我们

[1]	靳晓晗, 岩延, 张宝贤. 基于深度强化学习方法的无线多跳网络能量高效机会路由[J]. 中国科学院大学学报, 2022, 39(3): 410-420.
[2]	郭冠华, 焦臻桢, 周猛, 张宝贤. 基于地理位置和多阶邻节点辅助的编码感知无线多跳网络路由协议[J]. 中国科学院大学学报, 2015, 32(1): 140-144.