论文标题

Swift:通过无候补模型通信快速分散的联邦学习

SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication

论文作者

Bornstein, Marco, Rabbani, Tahseen, Wang, Evan, Bedi, Amrit Singh, Huang, Furong

论文摘要

分散的联合学习(FL)设置通过利用客户组通过本地化培训和模型/梯度共享来协作培训模型,从而避免了潜在不可靠或不信任的中央主机的角色。大多数现有的分散FL算法都需要同步速度取决于最慢的客户端的客户模型。在这项工作中,我们提出了Swift:一种新颖的无候补分散的FL算法,使客户可以自行进行培训。从理论上讲,我们证明SWIFT与并行随机梯度下降的金标准迭代收敛率$ \ MATHCAL {O}(1/\ sqrt {t})$,用于凸和非convex平滑优化(总迭代$ t $)。此外,我们为IID和非IID设置提供了理论结果,而对于其他异步分散的FL算法所需的慢速客户却没有任何有限的假设。尽管Swift与其他最先进的(SOTA)并行随机算法相对于$ t $达到了相同的迭代收敛速率,但由于其无候补的结构,它相对于运行时的收敛速度更快。我们的实验结果表明,由于每个时期的通信时间大幅度减少,Swift的运行时间降低了,与同步同步相比,该时间级别降低了数量级。此外,Swift在IID和非IID数据设置上产生图像分类的损失水平,比现有SOTA算法快50%以上。

The decentralized Federated Learning (FL) setting avoids the role of a potentially unreliable or untrustworthy central host by utilizing groups of clients to collaboratively train a model via localized training and model/gradient sharing. Most existing decentralized FL algorithms require synchronization of client models where the speed of synchronization depends upon the slowest client. In this work, we propose SWIFT: a novel wait-free decentralized FL algorithm that allows clients to conduct training at their own speed. Theoretically, we prove that SWIFT matches the gold-standard iteration convergence rate $\mathcal{O}(1/\sqrt{T})$ of parallel stochastic gradient descent for convex and non-convex smooth optimization (total iterations $T$). Furthermore, we provide theoretical results for IID and non-IID settings without any bounded-delay assumption for slow clients which is required by other asynchronous decentralized FL algorithms. Although SWIFT achieves the same iteration convergence rate with respect to $T$ as other state-of-the-art (SOTA) parallel stochastic algorithms, it converges faster with respect to run-time due to its wait-free structure. Our experimental results demonstrate that SWIFT's run-time is reduced due to a large reduction in communication time per epoch, which falls by an order of magnitude compared to synchronous counterparts. Furthermore, SWIFT produces loss levels for image classification, over IID and non-IID data settings, upwards of 50% faster than existing SOTA algorithms.

扫码加入交流群

加入微信交流群

微信交流群二维码

发送 求 20221014026 免费下载英文原文