Journal of University of Chinese Academy of Sciences ›› 2026, Vol. 43 ›› Issue (2): 230-239.DOI: 10.7523/j.ucas.2025.014
• Electronics & Computer Science • Previous Articles Next Articles
Jinhao LIU, Pei QUAN, Wen ZHANG(
)
Received:2024-08-20
Revised:2025-04-01
Online:2026-03-15
Contact:
Wen ZHANG
CLC Number:
Jinhao LIU, Pei QUAN, Wen ZHANG. Fake review identification for online products based on clustering fine-tuning[J]. Journal of University of Chinese Academy of Sciences, 2026, 43(2): 230-239.
| 模型 | 精确率 | 召回率 | F1 |
|---|---|---|---|
| LSTM | 82.21 | 83.77 | 82.91 |
| CNN | 81.56 | 80.87 | 81.19 |
| LSTM-CNN | 84.20 | 82.97 | 83.58 |
Table 1 Experimental results based on deep learning
| 模型 | 精确率 | 召回率 | F1 |
|---|---|---|---|
| LSTM | 82.21 | 83.77 | 82.91 |
| CNN | 81.56 | 80.87 | 81.19 |
| LSTM-CNN | 84.20 | 82.97 | 83.58 |
| 精确率 | 召回率 | F1 | |
|---|---|---|---|
| 随机5% | 85.13 | 83.67 | 84.39 |
| 随机10% | 88.23 | 85.85 | 87.03 |
| 随机20% | 90.26 | 88.97 | 89.61 |
| 所有数据 | 92.91 | 90.39 | 91.38 |
Table 2 Experimental results of random sampling fine-tuning
| 精确率 | 召回率 | F1 | |
|---|---|---|---|
| 随机5% | 85.13 | 83.67 | 84.39 |
| 随机10% | 88.23 | 85.85 | 87.03 |
| 随机20% | 90.26 | 88.97 | 89.61 |
| 所有数据 | 92.91 | 90.39 | 91.38 |
| 抽样 | 精确率 | 召回率 | F1 |
|---|---|---|---|
| 轮廓系数最高5% | 87.05 | 84.03 | 85.51 |
| 轮廓系数最高10% | 90.78 | 89.02 | 89.89 |
| 轮廓系数最高20% | 91.27 | 89.42 | 90.34 |
| 轮廓系数大小共5% | 89.22 | 86.68 | 87.94 |
| 轮廓系数大小共10% | 91.91 | 89.30 | 90.59 |
| 轮廓系数大小共20% | 92.29 | 90.03 | 91.15 |
Table 3 Experimental results of fine-tuning based on silhouette coefficient sampling
| 抽样 | 精确率 | 召回率 | F1 |
|---|---|---|---|
| 轮廓系数最高5% | 87.05 | 84.03 | 85.51 |
| 轮廓系数最高10% | 90.78 | 89.02 | 89.89 |
| 轮廓系数最高20% | 91.27 | 89.42 | 90.34 |
| 轮廓系数大小共5% | 89.22 | 86.68 | 87.94 |
| 轮廓系数大小共10% | 91.91 | 89.30 | 90.59 |
| 轮廓系数大小共20% | 92.29 | 90.03 | 91.15 |
| 抽样 | 精确率 | 召回率 | F1 |
|---|---|---|---|
| 距离最近5% | 86.71 | 85.12 | 85.92 |
| 距离最近10% | 90.62 | 87.98 | 89.31 |
| 距离最近20% | 91.28 | 89.01 | 90.14 |
| 距离远近共5% | 88.11 | 85.32 | 86.71 |
| 距离远近共10% | 91.31 | 88.85 | 90.06 |
| 距离远近共20% | 91.89 | 89.80 | 90.83 |
Table 4 Experimental results of fine-tuning based on Euclidean distance sampling
| 抽样 | 精确率 | 召回率 | F1 |
|---|---|---|---|
| 距离最近5% | 86.71 | 85.12 | 85.92 |
| 距离最近10% | 90.62 | 87.98 | 89.31 |
| 距离最近20% | 91.28 | 89.01 | 90.14 |
| 距离远近共5% | 88.11 | 85.32 | 86.71 |
| 距离远近共10% | 91.31 | 88.85 | 90.06 |
| 距离远近共20% | 91.89 | 89.80 | 90.83 |
| [1] | Paul H, Nikolaev A. Fake review detection on online E-commerce platforms: a systematic literature review[J]. Data Mining and Knowledge Discovery, 2021, 35(5): 1830-1881. DOI: 10.1007/s10618-021-00772-6 . |
| [2] | Gupta P, Gandhi S, Chakravarthi B R. Leveraging transfer learning techniques- BERT, RoBERTa, ALBERT and DistilBERT for fake review detection[C]//Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation. December 13 - 17, 2021, Virtual Event, India. ACM, 2021: 75-82. DOI: 10.1145/3503162.3503169 . |
| [3] | Mohawesh R, Xu S X, Tran S N, et al. Fake reviews detection: a survey[J]. IEEE Access, 2021, 9: 65771-65802. DOI: 10.1109/ACCESS.2021.3075573 . |
| [4] | Refaeli D, Hajek P. Detecting fake online reviews using fine-tuned BERT[C]//Proceedings of the 2021 5th International Conference on E-Business and Internet. October 15 - 17, 2021, Singapore, Singapore. ACM, 2021: 76-80. DOI: 10.1145/3497701.3497714 . |
| [5] | Hu B Z, Sheng Q, Cao J, et al. Bad actor, good advisor: exploring the role of large language models in fake news detection[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(20): 22105-22113. DOI: 10.1609/aaai.v38i20.30214 . |
| [6] | Huang Y, Sun L, FakeGPT: fake news generation, explanation and detection of large language models[EB/OL].arXiv 2023:2310.05046.(2023-10-08)[2024-08-23].. |
| [7] | Ke J, Xu Z, Xu T,et al.An implicit semantic enhanced fine-grained fake news detection method based on large language models[J]. Journal of Computer Research and Development, 2024, 61(5): 1250-1260.DOI: 10.7544/issn1000-1239.202330967 . |
| [8] | Devlin J, Chang M, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol 1(Long and Short papers): 4171-4186. |
| [9] | Oliver M, Wang G. Crafting efficient fine-tuning strategies for large language models[EB/OL].arXiv 2023:2407.13906.(2023-07-18)[2024-08-23].. |
| [10] | Singh A, Pandey N. A study of optimizations for fine-tuning large language models[EB/OL].arXiv 2024:2406.02290.(2024-06-06)[2024-08-23].. |
| [11] | Duma R A, Niu Z D, Nyamawe A S, et al. Fake review detection techniques, issues, and future research directions: a literature review[J]. Knowledge and Information Systems, 2024, 66(9): 5071-5112. DOI: 10.1007/s10115-024-02118-2 . |
| [12] | Zhang D S, Zhou L N, Kehoe J L, et al. What online reviewer behaviors really matter? effects of verbal and nonverbal behaviors on detection of fake online reviews[J]. Journal of Management Information Systems, 2016, 33(2): 456-481. DOI:10.1080/07421222.2016.1205907 . |
| [13] | Casillo M, Colace F, Gupta B B, et al. Fake news detection using LDA topic modelling and K-nearest neighbor classifier[C]// Computational Data and Social Networks. Cham: Springer International Publishing, 2021: 330-339. DOI: 10.1007/978-3-030-91434-9_29 . |
| [14] | Kumar N, Venugopal D, Qiu L F, et al. Detecting review manipulation on online platforms with hierarchical supervised learning[J]. Journal of Management Information Systems, 2018, 35(1): 350-380. DOI:10.1080/07421222.2018.1440758 . |
| [15] | Zhang W, Zhang X, Chen J D, et al. Stacking GA2M for inherently interpretable fraudulent reviewer identification by fusing target and non-target features[J].International Journal of General Systems,2024,54(3):298-333. DOI:10.1080/03081079.2024.2384404 . |
| [16] | Du Q, Zong C, Zhang J. MoDS: model-oriented data selection for instruction tuning[EB/OL].arXiv 2023:2311.15653.(2023-11-27)[2024-08-22].. |
| [17] | Li M, Zhang Y, Li Z T, et al. From quantity to quality: boosting LLM performance with self-guided data selection for instruction tuning[C]//Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,2024,Vol 1: 7602-7635.DOI: 10.18653/v1/2024.naacl-long.421 |
| [18] | Mehta V, Agarwal M, Kaliyar R K. A comprehensive and analytical review of text clustering techniques[J]. International Journal of Data Science and Analytics, 2024, 18(3): 239-258. DOI: 10.1007/s41060-024-00540-x . |
| [19] | Wang H, Zhou C D, Li L X. Design and application of a text clustering algorithm based on parallelized K-means clustering[J]. Revue D’ Intelligence Artificielle, 2019, 33(6): 453-460. DOI: 10.18280/ria.330608 . |
| [20] | Petukhova A, Matos-Carvalho J P, Fachada N. Text clustering with large language model embeddings[J]. International Journal of Cognitive Computing in Engineering, 2025, 6: 100-108. DOI: 10.1016/j.ijcce.2024.11.004 . |
| [21] | Ding N, Qin Y J, Yang G, et al. Parameter-efficient fine-tuning of large-scale pre-trained language models[J]. Nature Machine Intelligence, 2023, 5(3): 220-235. DOI: 10.1038/s42256-023-00626-4 . |
| [22] | Hu E J, Shen Y, Wallis P, et al. LoRA: low-rank adaptation of large language models[C]. ICLR 2022 Conference,2022,1912. . |
| [23] | Pennington J, Socher R, Manning C. Glove: global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar. Stroudsburg, PA, USA: ACL, 2014: 1532-1543. DOI: 10.3115/v1/d14-1162 . |
| [24] | Hartigan J A, Wong M A. Algorithm AS 136: a K-means clustering algorithm[J]. Applied Statistics, 1979, 28(1): 100. DOI: 10.2307/2346830 . |
| [25] | Oyewole G J, Thopil G A. Data clustering: application and trends[J]. Artificial Intelligence Review, 2023, 56(7): 6439-6475. DOI: 10.1007/s10462-022-10325-y . |
| [26] | Upadhye A. A survey of text clustering techniques: algorithms, applications, and challenges[J]. International Journal of Science and Research (IJSR), 2021, 10(9): 1749-1752. DOI: 10.21275/sr24304163737 . |
| [27] | Rayana S, Akoglu L. Collective opinion Spam detection: bridging review networks and metadata[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Sydney NSW Australia. ACM, 2015: 985-994. DOI: 10.1145/2783258.2783370 . |
| [28] | Loshchilov I, Hutter F. Decoupled weight decay regularization[EB/OL]. arXiv 2017:1711.05101. (2017-11-14)[2024-08-22]. . |
| [29] | Liu Y, Ott M, Goyal N, et al. RoBERTa: a robustly optimized BERT pretraining approach[EB/OL].arXiv 2019:1907.11692. (2019-07-26)[2024-08-23].. |
| [30] | Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357. DOI: 10.5555/1622407.1622416 . |
| [31] | Parsaeimehr E, Fartash M, Akbari Torkestani J. Improving feature extraction using a hybrid of CNN and LSTM for entity identification[J]. Neural Processing Letters, 2023, 55(5): 5979-5994. DOI: 10.1007/s11063-022-11122-y . |
| [1] | PENG Cheng, TAN Chong, LIU Hong, ZHENG Min. Clustering routing algorithm for WSN based on BBO optimized K-means [J]. Journal of University of Chinese Academy of Sciences, 2024, 41(3): 357-364. |
| [2] | PAN Chao, LI Lianghai, CAO Haiyi, ZHAO Yiming, YUAN Yifei, HAN Xiaoshuang. Single photon point cloud denoising method based on density and local statistics [J]. Journal of University of Chinese Academy of Sciences, 2024, 41(2): 268-274. |
| [3] | SHI Zhengyi, LIU Shuo, XIA Hao. A remote sensing image registration method combining feature information clustering and partitioning [J]. Journal of University of Chinese Academy of Sciences, 2024, 41(1): 97-106. |
| [4] | ZHANG Meng, PAN Zhigang. SAR image change detection algorithm based on hierarchical fuzzy clustering and wavelet convolution neural network [J]. Journal of University of Chinese Academy of Sciences, 2023, 40(5): 637-646. |
| [5] | LIAN Huiqiang, LIU Bing, LI Pengyuan, YU Hua. A fuel price recommendation model based on the sliced recurrent neural network under sales constraints [J]. Journal of University of Chinese Academy of Sciences, 2023, 40(4): 566-576. |
| [6] | ZHUANG Zijun, YUAN Xiaobing, PEI Jun, WANG Guohui, LIU Jianpo. An unsupervised representation learning approach for modelling forest landform characteristics and fire susceptibility assessment [J]. Journal of University of Chinese Academy of Sciences, 2023, 40(2): 227-239. |
| [7] | LIU Tiankai, LIU Hong, ZHENG Min, TAN Chong. SSA-based WSN clustering routing algorithm for power transmission and substation scenarios [J]. Journal of University of Chinese Academy of Sciences, 2023, 40(1): 119-127. |
| [8] | MA Yingying, WU Youlong, TANG Hua. Name disambiguation based on encoding attributes and graph topology [J]. Journal of University of Chinese Academy of Sciences, 2022, 39(3): 360-368. |
| [9] | YANG Suixin, GENG Xiurui, YANG Weitun, ZHAO Yongchao, LU Xiaojun. A method of hyperspectral remote sensing image classification based on spectral clustering [J]. , 2019, 36(2): 267-274. |
| [10] | LI Weiqian, QIU Ling. Joint clustering and resource allocation strategy for multicast device-to-device communication underlaying cellular networks [J]. , 2019, 36(1): 137-143. |
| [11] | XING Tao, HUANG Youhong, HU Qingrong, LI Jun, WANG Guanyong. SAR image segmentation based on dynamical K-means clustering algorithm [J]. , 2016, 33(5): 674-678. |
| [12] | GONG Xueshuang, YU Lijun, NIE Yueping, ZHU Jianfeng, PAN Yuqing . Spatial-pattern analysis of settlements in western region of Liaoning during pre-Qin period [J]. , 2016, 33(3): 373-379. |
| [13] | WU Wendi, CHENG Xijun, LIU Feng. CVaR portfolio model based on K-means clustering with the constraint of generalized entropy [J]. , 2016, 33(1): 31-36. |
| [14] | WANG Qiu-Ming, GAO Hui-Ying, LIU Ke-Cheng. Software requirement analysis based on fuzzy clustering and grey correlation degree [J]. , 2010, 27(6): 859-863. |
| [15] | SUN Bo, GAO Sui-Xiang, LU Qing. Study on optimization of communication radii of nodes in hierarchically clustered wireless sensor networks [J]. , 2010, 27(6): 818-823. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||
Copyright © Journal of University of Chinese Academy of Sciences
Support by Beijing Magtech Co.ltd support@magtech.com.cn