欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2026, Vol. 43 ›› Issue (2): 230-239.DOI: 10.7523/j.ucas.2025.014

• 电子信息与计算机科学 • 上一篇    下一篇

基于聚类微调的在线商品虚假评论识别

刘津浩, 权沛, 张文()   

  1. 北京工业大学经济与管理学院,北京 100124
  • 收稿日期:2024-08-20 修回日期:2025-04-01 发布日期:2026-03-17
  • 通讯作者: 张文
  • 基金资助:
    国家自然科学基金(72174018);国家自然科学基金(71932002);北京市自然科学基金(9222001);北京市自然科学基金(9244021);北京市教育委员会(SZ2021110005001)

Fake review identification for online products based on clustering fine-tuning

Jinhao LIU, Pei QUAN, Wen ZHANG()   

  1. College of Economics and Management,Beijing University of Technology,Beijing 100124,China
  • Received:2024-08-20 Revised:2025-04-01 Published:2026-03-17
  • Contact: Wen ZHANG

摘要:

虚假评论影响在线消费者的购买决策,如何高效识别虚假评论是当前电子商务发展中亟待解决的问题。传统虚假评论识别方法容易受到评论文本风格、句法词法和上下文差异的影响,准确性偏低;大语言模型(LLM)虽然可以解决这一问题,但训练过程往往耗费机时过长。为此,提出一种新的基于聚类微调的在线商品虚假评论识别方法(CF-DRI)。该方法通过筛选聚类评论样本来微调LLM的预训练知识,以显著提升LLM识别虚假评论的训练效率。相较于传统方法,CF-DRI方法在微调样本数量较少的情况下即可表现出优异的识别性能。基于Yelp.com数据集的实验结果表明:CF-DRI方法仅利用20%的聚类样本,即可实现92.29%的虚假评论识别精确率和90.03%的识别召回率。

关键词: 虚假评论识别, 大语言模型, 微调, 聚类

Abstract:

Fake reviews affect online consumers’purchasing decisions. Efficiently identifying fake reviews is a pressing issue in the current development of e-commerce. Traditional methods for detecting fake reviews are often influenced by variations in review text style, syntax, and context, resulting in lower accuracy. Although large language models (LLMs) can address this accuracy issue, their training process is typically time-consuming. To tackle this problem, we propose a novel method called CF-DRI (cluster-based fine-tuning for deceptive review identification). This method fine-tunes the pre-trained knowledge of LLMs by selecting clustered review samples, significantly enhancing the training efficiency for fake review identification. Compared to traditional methods, CF-DRI demonstrates superior performance with fewer fine-tuning samples. Experimental results on the Yelp.com dataset show that CF-DRI achieves a precision of 92.29% and a recall of 90.03% in fake review identification using only 20% of the clustered samples. This research provides new perspectives and solutions for managing fake reviews on e-commerce platforms, potentially promoting healthy industry development.

Key words: fake review identification, large language models, fine-tuning, clustering

中图分类号: