Solving quadratic assignment problem based on actor-critic framework

doi:10.7523/j.ucas.2022.031

Abstract

Abstract: The quadratic assignment problem (QAP) is one of the NP-hard combinatorial optimization problems and is known for its diverse applications in real life. The current relatively mature heuristic algorithms are usually problem-oriented to design customized algorithms and lack the ability to transfer and generalize. In order to provide a unified QAP solution strategy, this paper abstracts the flow matrix and distance matrix of QAP problem into two undirected complete graphs and constructs corresponding correlation graphs, thus transforming the assignment task of facilities and locations into node selection task on the association graph. Based on actor-critic framework, this paper proposes a new algorithm ACQAP(actor-critic for QAP). Firstly, the model uses a multi-headed attention mechanism to construct a policy network to process the node representation vectors from the graph convolutional neural network; Then, the actor-critic algorithm is used to predict the probability of each node being output as the optimal node. Finally, the model outputs an action decision sequence that satisfies the objective reward function within a feasible time. The algorithm is free from manual design and is more flexible and reliable as it is applicable to different sizes of inputs. The experimental results show that on QAPLIB instances, the algorithm has stronger transfer and generalization ability under the premise that the accuracy is comparable to the traditional heuristic algorithm, while the assignment cost for solving is less compared to the latest learning-based algorithms such as NGM, and the deviation is less than 20% in most instances.

Key words: quadratic assignment problem, graph convolutional neural network, deep reinforcement learning, multi-head-attention mechanism, actor-critic algorithm

CLC Number:

TP391

LI Xueyuan, HAN Congying. Solving quadratic assignment problem based on actor-critic framework[J]. Journal of University of Chinese Academy of Sciences, 2024, 41(2): 275-284.

References

[1] Koopmans T C, Beckmann M. Assignment problems and the location of economic activities[J]. Econometrica, 1957, 25(1):53. DOI:10.2307/1907742.
[2] Pardalos P M, Xue J. The maximum clique problem[J]. Journal of Global Optimization, 1994, 4(3): 301-328. DOI:10.1007/BF01098364.
[3] Steinberg L. The backboard wiring problem: a placement algorithm[J]. SIAM Review, 1961, 3(1): 37-50. DOI:10.1137/1003003.
[4] Kusiak A, Heragu S S. The facility layout problem[J]. European Journal of Operational Research, 1987, 29(3): 229-251. DOI:10.1016/0377-2217(87)90238-4.
[5] şeri A, Ekşioğlu M. Estimation of digraph costs for keyboard layout optimization[J]. International Journal of Industrial Ergonomics, 2015, 48: 127-138. DOI:10.1016/j.ergon.2015.04.006.
[6] Manne A S. On the job-shop scheduling problem[J]. Operations Research, 1960, 8(2): 219-223. DOI:10.1287/opre.8.2.219.
[7] Flood M M. The traveling-salesman problem[J]. Operations Research, 1956, 4(1): 61-75. DOI:10.1287/opre.4.1.61.
[8] Kirby R C, Siebenmann L C. On the triangulation of manifolds and the hauptvermutung[J]. Bulletin of the American Mathematical Society, 1969, 75(4): 742-750. DOI:10.1090/s0002-9904-1969-12271-8.
[9] Pardalos P M, Xue J. The maximum clique problem[J]. Journal of Global Optimization, 1994, 4(3): 301-328. DOI:10.1007/BF01098364.
[10] Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning[EB/OL]. arXiv:1312.5602. (2013-12-19)[2013-12-19]. https://arxiv.org/abs/1312.5602.
[11] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks[EB/OL].arXiv:1609.02907. (2016-09-09)[2017-02-22]. https://arxiv.org/abs/1609.02907.
[12] Vinyals O, Fortunato M, Jaitly N. Pointer networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 2(NIPS'15). MIT Press, Cambridge, MA, USA, 2692-2700. DOI: 10.5555/2969442.2969540.
[13] Bello I, Pham H, Le Q V, et al. Neural combinatorial optimization with reinforcement learning[EB/OL]. arXiv:1611.09940. (2016-11-29)[2017-01-12]. https://arxiv.org/abs/1611.09940.
[14] Dai H J, Khalil E B, Zhang Y Y, et al. Learning combinatorial optimization algorithms over graphs[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 6351-6361, 2017. DOI: 10.5555/3295222.3295382.
[15] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518, 529-533. DOI:10.1038/nature14236.
[16] Kool W, Hoof H V, Welling M. Attention, learn to solve routing problems![EB/OL]. arXiv:1803.08475. (2018-03-22)[2019-02-07]. https://arxiv.org/abs/1803.08475v3.
[17] Nazari M, Oroojlooy A, Takáč M, et al. Reinforcement learning for solving the vehicle routing problem[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS'18). Curran Associates Inc., Red Hook, NY, USA, 9861-9871. DOI: 10.5555/3327546.3327651.
[18] Groß J. Using deep reinforcement learning to optimize assignment problems[D/OL]. Saarbrücken: Saarland University, 2021[2021-01-06]. https://mosi.uni-saarland.de/assets/theses/ma_joschka.pdf.
[19] Tang X C, Qin Z T, Zhang F, et al. A deep value-network based approach for multi-driver order dispatching[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Anchorage AK USA. New York, NY, USA: ACM, 2019:1780-1790. DOI: 10.1145/3292500.3330724.
[20] Wang R Z, Yan J C, Yang X K. Neural graph matching network: learning lawler’s quadratic assignment problem with extension to hypergraph and multiple-graph matching[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8053, PP(99): 1. DOI:10.1109/TPAMI.2021.3078053.
[21] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 6000-6010. DOI: 10.5555/3295222.3295349.
[22] Puterman M L. Markov decision processes[M]//Handbooks in Operations Research and Management Science, 1990, 2: 331-434. DOI:10.1016/S0927-0507(05)80172-0.
[23] Ba J, Mnih V, Kavukcuoglu K. Multiple object recognition with visual attention[EB/OL]. arXiv:1412.7755. (2014-12-24)[2015-04-23]. https://arxiv.org/abs/1412.7755v2.
[24] Sutton R S, McAllester D, Singh S, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Proceedings of the 12th International Conference on Neural Information Processing Systems (NIPS'99). MIT Press, Cambridge, MA, USA, 1057-1063. DOI: 10.5555/3009657.3009806.
[25] Burkard R E, Karisch S, Rendl F. QAPLIB-A quadratic assignment problem library[J]. European Journal of Operational Research, 1991, 55(1): 115-119. DOI:10.1016/0377-2217(91)90197-4.
[26] Commander C W. A survey of the quadratic assignment problem, with applications[J]. Morehead Electronic Journal of Applicable Mathematics, 2005, (4):1-15.
[27] Tseng L Y, Liang S C. A hybrid metaheuristic for the quadratic assignment problem[J]. Computational Optimization and Applications, 2006, 34(1): 85-113. DOI:10.1007/s10589-005-3069-9.
[28] Taillard É D. Comparison of iterative searches for the quadratic assignment problem[J]. Location Science, 1995, 3(2): 87-105. DOI:10.1016/0966-8349(95)00008-6.