论文标题
带压缩梯度的分布式培训的标准错误反馈
Step-Ahead Error Feedback for Distributed Training with Compressed Gradient
论文作者
论文摘要
尽管分布式机器学习方法可以加快大型深神经网络的训练,但通信成本已成为不可忽略的瓶颈,以限制性能。为了应对这一挑战,基于梯度压缩的分布式学习方法旨在降低沟通成本,而最近,纳入了本地错误反馈,以补偿相应的性能损失。但是,在本文中,我们将证明,集中式分布式培训中的本地错误反馈提出了一个新的“梯度不匹配”问题,并且与完整的精确培训相比,性能可能会降低。为了解决这个关键问题,我们提出了两种新型技术,1)前进和2)平均误差,并进行了严格的理论分析。我们的理论和经验结果都表明,我们的新方法可以解决“梯度不匹配”问题。实验结果表明,我们甚至可以使用常见的梯度压缩方案更快地训练训练,而不是针对训练时期的完整精确训练和局部错误反馈,而没有绩效损失。
Although the distributed machine learning methods can speed up the training of large deep neural networks, the communication cost has become the non-negligible bottleneck to constrain the performance. To address this challenge, the gradient compression based communication-efficient distributed learning methods were designed to reduce the communication cost, and more recently the local error feedback was incorporated to compensate for the corresponding performance loss. However, in this paper, we will show that a new "gradient mismatch" problem is raised by the local error feedback in centralized distributed training and can lead to degraded performance compared with full-precision training. To solve this critical problem, we propose two novel techniques, 1) step ahead and 2) error averaging, with rigorous theoretical analysis. Both our theoretical and empirical results show that our new methods can handle the "gradient mismatch" problem. The experimental results show that we can even train faster with common gradient compression schemes than both the full-precision training and local error feedback regarding the training epochs and without performance loss.
