论文标题
差异私有数据合成以有效重新识别风险控制
Differentially-Private Data Synthetisation for Efficient Re-Identification Risk Control
论文作者
论文摘要
从统计转换到生成模型,可以通过多种方法来保护用户数据隐私。但是,所有这些都有关键的缺点。例如,使用传统技术创建转换的数据集非常耗时。此外,除了长期培训阶段外,最近基于深度学习的解决方案还需要大量的计算资源,而基于私有的解决方案可能会破坏数据实用程序。在本文中,我们提出了$ε$ - PRIVATESMOTE,该技术旨在保护重新识别和连锁攻击,尤其是解决具有高\ \ thoppy重新识别风险的案件。我们的提案通过噪声诱导的插值与差异隐私原则结合了综合数据的生成,以混淆高危案例。我们展示了与多种传统和最先进的隐私保护方法相比,$ε$ - privatesMote如何能够在隐私风险中实现竞争成果,并更好地预测性能,包括生成的对抗性网络,变分的自动装码器和差异性隐私基线。我们还展示了我们的方法如何将时间要求提高至少9倍,并且是一种资源有效的解决方案,可确保没有专门硬件的高性能。
Protecting user data privacy can be achieved via many methods, from statistical transformations to generative models. However, all of them have critical drawbacks. For example, creating a transformed data set using traditional techniques is highly time-consuming. Also, recent deep learning-based solutions require significant computational resources in addition to long training phases, and differentially private-based solutions may undermine data utility. In this paper, we propose $ε$-PrivateSMOTE, a technique designed for safeguarding against re-identification and linkage attacks, particularly addressing cases with a high \sloppy re-identification risk. Our proposal combines synthetic data generation via noise-induced interpolation with differential privacy principles to obfuscate high-risk cases. We demonstrate how $ε$-PrivateSMOTE is capable of achieving competitive results in privacy risk and better predictive performance when compared to multiple traditional and state-of-the-art privacy-preservation methods, including generative adversarial networks, variational autoencoders, and differential privacy baselines. We also show how our method improves time requirements by at least a factor of 9 and is a resource-efficient solution that ensures high performance without specialised hardware.
