论文标题
突破后的恢复:防止泄漏的DNN模型的白盒对抗示例
Post-breach Recovery: Protection against White-box Adversarial Examples for Leaked DNN Models
论文作者
论文摘要
服务器漏洞是当今互联网上不幸的现实。在深神经网络(DNN)模型的背景下,它们特别有害,因为泄漏的模型可为攻击者“白色框”访问生成对抗性示例,这是一个没有实践强大防御的威胁模型。对于已经将数年和数百万美元投入专有DNN的从业者来说,例如医学成像,这似乎是一场不可避免的灾难。 在本文中,我们考虑了DNN模型后漏油后恢复的问题。我们提出了NEO,NEO是一种新系统,可创建新版本的泄漏模型,并进行推理时间过滤器,该推理时间过滤器检测并删除了先前泄漏模型上生成的对抗性示例。不同模型版本的分类表面略有偏移(通过引入隐藏的分布),NEO检测到对其一代中使用的泄漏模型的攻击过度拟合。我们表明,在各种任务和攻击方法中,NEO能够以非常高的精度从泄漏的模型中滤除攻击,并为反复违反服务器的攻击者提供强有力的保护(7--10个恢复)。 Neo在各种强烈的适应性攻击方面表现良好,略有损坏的漏洞可追回,并证明了对野外DNN防御的补充的潜力。
Server breaches are an unfortunate reality on today's Internet. In the context of deep neural network (DNN) models, they are particularly harmful, because a leaked model gives an attacker "white-box" access to generate adversarial examples, a threat model that has no practical robust defenses. For practitioners who have invested years and millions into proprietary DNNs, e.g. medical imaging, this seems like an inevitable disaster looming on the horizon. In this paper, we consider the problem of post-breach recovery for DNN models. We propose Neo, a new system that creates new versions of leaked models, alongside an inference time filter that detects and removes adversarial examples generated on previously leaked models. The classification surfaces of different model versions are slightly offset (by introducing hidden distributions), and Neo detects the overfitting of attacks to the leaked model used in its generation. We show that across a variety of tasks and attack methods, Neo is able to filter out attacks from leaked models with very high accuracy, and provides strong protection (7--10 recoveries) against attackers who repeatedly breach the server. Neo performs well against a variety of strong adaptive attacks, dropping slightly in # of breaches recoverable, and demonstrates potential as a complement to DNN defenses in the wild.
