[1] Biswas S, Morris R. Opportunistic routing in multi-hop wireless networks[J]. ACM SIGCOMM Computer Communication Review, 2004, 34(1): 69-74. DOI:10.1145/972374.972387. [2] Chachulski S, Jennings M, Katti S, et al. Trading structure for randomness in wireless opportunistic routing[J]. ACM SIGCOMM Computer Communication Review, 2007, 37(4): 169-180. DOI:10.1145/1282427.1282400. [3] Zorzi M, Rao R R. Geographic random forwarding (GeRaF) for ad hoc and sensor networks: energy and latency performance[J]. IEEE Transactions on Mobile Computing, 2003, 2(4): 349-365. DOI:10.1109/TMC.2003.1255650. [4] Chu M, Li H, Liao X, et al. Reinforcement learning-based multiaccess control and battery prediction with energy harvesting in IoT systems[J]. IEEE Internet of Things Journal, 2019, 6(2): 2009-2020. DOI:10.1109/JIOT.2018.2872440. [5] Sutton R S, Barto A G. Reinforcement learning: an introduction[M]. Cambridge, MA, USA: MIT press, 2018. [6] Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489. DOI:10.1038/nature16961. [7] Mirowski P, Pascanu R, Viola F, et al. Learning to navigate in complex environments[EB/OL]. ArXiv Preprint, 2016: 1611.03673. (2017-01-13)[2020-04-18]. http://arxiv.org/abs/1611.03673. [8] He D, Xia Y C, Qin T, et al. Dual learning for machine translation[C]//Advances in Neural Information Processing Systems, 2016: 820-828. [9] Mammeri Z. Reinforcement learning based routing in networks: review and classification of approaches[J]. IEEE Access, 2019, 7: 55916-55950. DOI:10.1109/ACCESS.2019.2913776. [10] Perkins C E, Bhagwat P. Highly dynamic destination-sequenced distance-vector routing (DSDV) for mobile computers[J]. ACM SIGCOMM Computer Communication Review, 1994, 24(4): 234-244. DOI:10.1145/190809.190336. [11] Jacquet P, Muhlethaler P, Clausen T, et al. Optimized link state routing protocol for ad hoc networks[C]//Proceedings of IEEE International Multi Topic Conference (IEEE INMIC 2001). Technology for the 21st Century. December 30, 2001, Lahore, Pakistan. IEEE, 2001: 62-68. DOI:10.1109/INMIC.2001.995315. [12] Perkins C E, Royer E M. Ad-hoc on-demand distance vector routing[C]//Proceedings WMCSA'99. Second IEEE Workshop on Mobile Computing Systems and Applications. February 25-26, 1999, New Orleans, LA, USA. IEEE, 1999: 90-100. DOI:10.1109/MCSA.1999.749281. [13] Park V D, Corson M S. A highly adaptive distributed routing algorithm for mobile wireless networks[C]//Proceedings of INFOCOM'97. April 7-11, 1997, Kobe, Japan. IEEE, 1997, 3: 1405-1413. DOI:10.1109/INFCOM.1997.631180. [14] Youssef M, Ibrahim M, Abdelatif M, et al. Routing metrics of cognitive radio networks: a survey[J]. IEEE Communications Surveys & Tutorials, 2014, 16(1): 92-109. DOI:10.1109/SURV.2013.082713.00184. [15] Boyan J, Littman M. Packet routing in dynamically changing networks: a reinforcement learning approach[C]//Advances in Neural Information Processing Systems, 1994: 671-678. [16] Watkins C J C H, Dayan P. Q-learning[J]. Machine learning, 1992, 8(3/4): 279-292. DOI:10.1007/BF00992698. [17] Choi S P M, Yeung D Y. Predictive Q-routing: a memory-based reinforcement learning approach to adaptive traffic control[C]//Advances in Neural Information Processing Systems. 1996: 945-951. [18] Kumar S, Miikkulainen R. Dual reinforcement Q-routing: an on-line adaptive routing algorithm[C]//Proceedings of the Artificial Neural Networks in Engineering Conference, 1997: 231-238. [19] Tang K X, Li C L, Xiong H K, et al. Reinforcement learning-based opportunistic routing for live video streaming over multi-hop wireless networks[C]//2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP). October 16-18, 2017, Luton, UK. IEEE, 2017: 1-6. DOI:10.1109/MMSP.2017.8122255. [20] Liu Y, Tong K F, Wong K K. Reinforcement learning based routing for energy sensitive wireless mesh IoT networks[J]. Electronics Letters, 2019, 55(17): 966-968. DOI:10.1049/el.2019.1864. [21] Zhao X D, Yang H J, Zong G D. Adaptive neural hierarchical sliding mode control of nonstrict-feedback nonlinear systems and an application to electronic circuits[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2017, 47(7): 1394-1404. DOI:10.1109/TSMC.2016.2613885. [22] Luong N C, Hoang D T, Gong S M, et al. Applications of deep reinforcement learning in communications and networking: a survey[J]. IEEE Communications Surveys & Tutorials, 2019, 21(4): 3133-3174. DOI:10.1109/COMST.2019.2916583. [23] Mukhutdinov D, Filchenkov A, Shalyto A, et al. Multi-agent deep learning for simultaneous optimization for time and energy in distributed routing system[J]. Future Generation Computer Systems, 2019, 94: 587-600. DOI:10.1016/j.future.2018.12.037. [24] Valadarsky A, Schapira M, Shahaf D, et al. A machine learning approach to routing[EB/OL]. ArXiv Preprint, 2017: 1708.03074. (2017-11-11)[2020-04-18].http:// arxiv.org/abs/1708.03074. [25] Stampa G, Arias M, Sánchez-Charles D, et al. A deep-reinforcement learning approach for software-defined networking routing optimization[EB/OL]. ArXiv Preprint, 2017: 1709.07080. (2017-09-20)[2020-04-18].http:// arxiv.org/abs/1709.07080. [26] de Couto D S J, Aguayo D, Bicket J, et al. A high-throughput path metric for multi-hop wireless routing[C]//Proceedings of the 9th Annual International Conference on Mobile Computing and Networking. 2003: 134-146. DOI:10.1145/938985.939000. [27] Cormen T H, Leiserson C E, Rivest R L, et al. Introduction to algorithms[M]. Cambridge, MA, USA: MIT Press, 2009. [28] Jain A, Nandakumar K, Ross A. Score normalization in multimodal biometric systems[J]. Pattern Recognition, 2005, 38(12): 2270-2285. DOI:10.1016/j.patcog.2005.01.012. [29] Wang Z, Crowcroft J. Quality-of-service routing for supporting multimedia applications[J]. IEEE Journal on Selected Areas in Communications, 1996, 14(7): 1228-1234. DOI:10.1109/49.536364. [30] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. DOI:10.1038/nature14236. [31] Hester T, Vecerik M, Pietquin O, et al. Deep Q-learning from demonstrations [C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018,32(1):3223-3230. [32] Hagberg A, Swart P, S Chult D. Exploring network structure, dynamics, and function using NetworkX[C]//Proceedings of the 7th Python in Science Conference (SciPy). 2008: 11-15. [33] Du Y H, Xu Y, Xue L, et al. An energy-efficient cross-layer routing protocol for cognitive radio networks using apprenticeship deep reinforcement learning[J]. Energies, 2019, 12(14): 2829. DOI:10.3390/en1214829. |