论文标题
推荐系统中数据泄漏的批判性研究离线评估
A Critical Study on Data Leakage in Recommender System Offline Evaluation
论文作者
论文摘要
推荐模型很难评估,尤其是在离线设置下。在本文中,我们对推荐系统离线评估中的数据泄漏问题进行了全面而批判性的分析。数据泄漏是由于未观察到全局时间表在评估推荐人时引起的,例如,火车/测试数据拆分不遵循全局时间表。结果,一个模型从预计在预测时间无法获得的用户项目交互中学习。我们首先显示沿全球时间轴的用户项目交互的时间动态,然后解释为什么存在协作过滤模型的数据泄漏。通过经过精心设计的实验,我们表明,由于数据泄漏的结果,所有模型确实建议在测试实例时间点无法使用的未来项目。这些实验是在四个流行的离线数据集上使用四种广泛使用的基线模型-BPR,NEUMF,SASREC和LightGCN进行的-Movielens-25m,Yelp,Yelp,Amazon-Music和Amazon-Electronic,采用左右左右的数据拆分。我们进一步表明,数据泄漏确实会影响模型的建议精度。因此,随着培训中不同泄漏的未来数据,它们的相对性能顺序变得不可预测。为了在离线环境中以现实的方式评估推荐系统,我们提出了一个时间表方案,该方案要求重新访问推荐模型设计。
Recommender models are hard to evaluate, particularly under offline setting. In this paper, we provide a comprehensive and critical analysis of the data leakage issue in recommender system offline evaluation. Data leakage is caused by not observing global timeline in evaluating recommenders, e.g., train/test data split does not follow global timeline. As a result, a model learns from the user-item interactions that are not expected to be available at prediction time. We first show the temporal dynamics of user-item interactions along global timeline, then explain why data leakage exists for collaborative filtering models. Through carefully designed experiments, we show that all models indeed recommend future items that are not available at the time point of a test instance, as the result of data leakage. The experiments are conducted with four widely used baseline models - BPR, NeuMF, SASRec, and LightGCN, on four popular offline datasets - MovieLens-25M, Yelp, Amazon-music, and Amazon-electronic, adopting leave-last-one-out data split. We further show that data leakage does impact models' recommendation accuracy. Their relative performance orders thus become unpredictable with different amount of leaked future data in training. To evaluate recommendation systems in a realistic manner in offline setting, we propose a timeline scheme, which calls for a revisit of the recommendation model design.
