论文标题
私人随机优化,具有大型最差的Lipschitz参数
Private Stochastic Optimization With Large Worst-Case Lipschitz Parameter
论文作者
论文摘要
我们研究了差异私有(DP)随机优化(SO)具有损失函数,其最差的Lipschitz参数在所有数据上可能非常大或无限。迄今为止,大多数在DP上的工作,因此假设损失在数据上是统一的Lipschitz连续(即随机梯度统一的)。尽管此假设很方便,但通常会导致悲观的风险界限。在许多实际问题中,由于异常值和/或重尾数据,所有数据的最差(统一)Lipschitz参数可能很大。在这种情况下,DP SO的风险范围(以最差的Lipschitz参数尺寸)是空的。为了解决这些局限性,我们提供了改进的风险范围,不取决于统一的Lipschitz参数。在最近的工作线[WXDX20,KLZ22]之后,我们假设随机梯度已经限制了$ k $ th的订单矩矩,以$ k \ geq 2 $。与Lipschitz DP上的作品相比,我们的风险范围与$ k $ - them时刻而不是损失的统一Lipschitz参数相比,在存在异常值和/或重型数据的情况下,损失的Lipschitz参数可以明显更快。 对于平滑的凸损耗功能,我们提供了最新的多余风险线性时间算法。我们通过新颖的下限来补充多余的风险上限。在某些参数制度中,我们的线性时间过剩风险范围是最小的。其次,我们提供了第一种算法来处理非平滑凸丢失函数。为此,我们开发了新颖的算法和稳定性证明技术,我们认为这对于未来的工作将在获得最佳的超额风险方面有用。最后,我们的工作是第一个解决非凸面的lipchitz损失功能,使近端不平等的损失功能。这涵盖了一些实用的机器学习模型。我们的近端PL算法具有几乎最佳的多余风险。
We study differentially private (DP) stochastic optimization (SO) with loss functions whose worst-case Lipschitz parameter over all data may be extremely large or infinite. To date, the vast majority of work on DP SO assumes that the loss is uniformly Lipschitz continuous (i.e. stochastic gradients are uniformly bounded) over data. While this assumption is convenient, it often leads to pessimistic risk bounds. In many practical problems, the worst-case (uniform) Lipschitz parameter of the loss over all data may be huge due to outliers and/or heavy-tailed data. In such cases, the risk bounds for DP SO, which scale with the worst-case Lipschitz parameter, are vacuous. To address these limitations, we provide improved risk bounds that do not depend on the uniform Lipschitz parameter. Following a recent line of work [WXDX20, KLZ22], we assume that stochastic gradients have bounded $k$-th order moments for some $k \geq 2$. Compared with works on uniformly Lipschitz DP SO, our risk bounds scale with the $k$-th moment instead of the uniform Lipschitz parameter of the loss, allowing for significantly faster rates in the presence of outliers and/or heavy-tailed data. For smooth convex loss functions, we provide linear-time algorithms with state-of-the-art excess risk. We complement our excess risk upper bounds with novel lower bounds. In certain parameter regimes, our linear-time excess risk bounds are minimax optimal. Second, we provide the first algorithm to handle non-smooth convex loss functions. To do so, we develop novel algorithmic and stability-based proof techniques, which we believe will be useful for future work in obtaining optimal excess risk. Finally, our work is the first to address non-convex non-uniformly Lipschitz loss functions satisfying the Proximal-PL inequality; this covers some practical machine learning models. Our Proximal-PL algorithm has near-optimal excess risk.
