论文标题
具有绝对压缩和错误补偿的分布式方法
Distributed Methods with Absolute Compression and Error Compensation
论文作者
论文摘要
分布式优化方法通常用于解决巨大的问题,例如培训数百万甚至数十亿个参数的神经网络。在这样的应用中,传达完整的向量,例如(随机)梯度,迭代,非常昂贵,尤其是在工人数量较大的情况下。沟通压缩是一种减轻此问题的有力方法,特别是,由于其实际效率,具有偏见的压缩和错误补偿的方法非常受欢迎。 Sahu等。 (2021)对绝对压缩操作员类别的错误补偿SGD(EC-SGD)提出了新的分析,这表明从某种意义上说,此类包含EC-SGD的最佳压缩机。但是,该分析仅在所谓的$(m,σ^2)$界噪声假设下进行。在本文中,我们将具有绝对压缩的EC-SGD的分析概括为任意采样策略,并提出了误差补偿无环的随机方差降低梯度方法(EC-LSVRG)的首次分析,并且对(强烈)cONVEX问题的绝对压缩。在这种情况下,我们对先前已知的费率提高了。数值实验证实了我们的理论发现。
Distributed optimization methods are often applied to solving huge-scale problems like training neural networks with millions and even billions of parameters. In such applications, communicating full vectors, e.g., (stochastic) gradients, iterates, is prohibitively expensive, especially when the number of workers is large. Communication compression is a powerful approach to alleviating this issue, and, in particular, methods with biased compression and error compensation are extremely popular due to their practical efficiency. Sahu et al. (2021) propose a new analysis of Error Compensated SGD (EC-SGD) for the class of absolute compression operators showing that in a certain sense, this class contains optimal compressors for EC-SGD. However, the analysis was conducted only under the so-called $(M,σ^2)$-bounded noise assumption. In this paper, we generalize the analysis of EC-SGD with absolute compression to the arbitrary sampling strategy and propose the first analysis of Error Compensated Loopless Stochastic Variance Reduced Gradient method (EC-LSVRG) with absolute compression for (strongly) convex problems. Our rates improve upon the previously known ones in this setting. Numerical experiments corroborate our theoretical findings.
