论文标题
预定的drophead:变压器模型的正则化方法
Scheduled DropHead: A Regularization Method for Transformer Models
论文作者
论文摘要
在本文中,我们介绍了Drophead,这是一种结构化的辍学方法,专门设计用于正规化多头注意机制,这是Transformer的关键组成部分,Transformer是各种NLP任务的最新模型。与传统的辍学机制相反,随机放下单位或连接,提出的drophead是一种结构化的辍学方法。它在训练过程中降低了整个注意力头,并且可以防止多头注意模型由一小部分注意力头部主导,同时还降低了过度拟合训练数据的风险,从而更有效地利用了多头注意力的注意力机制。通过有关多头注意机制的学习动态的最新研究的激励,我们提出了一个特定的辍学率计划,以适应性地调整毛茸茸的辍学率并实现更好的正则化效果。机器翻译和文本分类基准数据集的实验结果证明了所提出的方法的有效性。
In this paper, we introduce DropHead, a structured dropout method specifically designed for regularizing the multi-head attention mechanism, which is a key component of transformer, a state-of-the-art model for various NLP tasks. In contrast to the conventional dropout mechanisms which randomly drop units or connections, the proposed DropHead is a structured dropout method. It drops entire attention-heads during training and It prevents the multi-head attention model from being dominated by a small portion of attention heads while also reduces the risk of overfitting the training data, thus making use of the multi-head attention mechanism more efficiently. Motivated by recent studies about the learning dynamic of the multi-head attention mechanism, we propose a specific dropout rate schedule to adaptively adjust the dropout rate of DropHead and achieve better regularization effect. Experimental results on both machine translation and text classification benchmark datasets demonstrate the effectiveness of the proposed approach.
