论文标题
Malcom:发出恶意评论来攻击神经假新闻检测模型
MALCOM: Generating Malicious Comments to Attack Neural Fake News Detection Models
论文作者
论文摘要
近年来,所谓的“假新闻”的扩散造成了社会上的许多干扰,并削弱了新闻生态系统。因此,为了减轻此类问题,研究人员开发了最先进的模型,以使用复杂的数据科学和机器学习技术在社交媒体上自动检测假新闻。那么,在这项工作中,我们问:“如果对手试图攻击这种检测模型怎么办?”并通过(i)提出针对假新闻探测器的新型威胁模型来调查相关问题,在该模型中,对手可以对新闻文章发表恶意评论,以误导假新闻探测器,(ii)开发Malcom,Malcom,Malcom,这是一个端到端的对抗性评论生成框架,以实现此类攻击。通过全面的评估,我们证明,大约94%和93.5%的时间Malcom可以成功地误导最新的神经检测模型中的五个,以始终输出目标的真实和假新闻标签。此外,Malcom还可以欺骗黑匣子假新闻探测器,以始终在90%的时间内输出真实的新闻标签。我们还将攻击模型与两个现实世界数据集的四个基线进行了比较,这不仅在攻击性能方面,而且还取决于生成的质量,相干性,可传递性和鲁棒性。
In recent years, the proliferation of so-called "fake news" has caused much disruptions in society and weakened the news ecosystem. Therefore, to mitigate such problems, researchers have developed state-of-the-art models to auto-detect fake news on social media using sophisticated data science and machine learning techniques. In this work, then, we ask "what if adversaries attempt to attack such detection models?" and investigate related issues by (i) proposing a novel threat model against fake news detectors, in which adversaries can post malicious comments toward news articles to mislead fake news detectors, and (ii) developing MALCOM, an end-to-end adversarial comment generation framework to achieve such an attack. Through a comprehensive evaluation, we demonstrate that about 94% and 93.5% of the time on average MALCOM can successfully mislead five of the latest neural detection models to always output targeted real and fake news labels. Furthermore, MALCOM can also fool black box fake news detectors to always output real news labels 90% of the time on average. We also compare our attack model with four baselines across two real-world datasets, not only on attack performance but also on generated quality, coherency, transferability, and robustness.
