论文标题
迈向透明和可解释的注意模型
Towards Transparent and Explainable Attention Models
论文作者
论文摘要
关于注意力分布的解释性的最新研究导致对模型预测的忠实和合理的解释。如果较高的注意力重量对模型的预测产生更大的影响,则注意力分布可以被认为是忠实的解释。如果他们为模型的预测提供了人类理解的理由,则可以认为它们是合理的解释。在这项工作中,我们首先解释了为什么基于LSTM的编码器中的当前注意力机制既不能对模型的预测提供忠实的解释,也不能提供合理的解释。我们观察到,在基于LSTM的编码中,不同时间步长的隐藏表示形式彼此非常相似(高锥度),并且在这些情况下的注意力权重没有太大的含义,因为即使注意力的随机置换也不会影响模型的预测。基于对各种任务和数据集的实验,我们观察到注意力分布通常将模型的预测归因于标点符号(例如标点符号),并且无法为预测提供合理的解释。为了使注意机制更加忠实和合理,我们提出了一个经过多样性驱动训练目标的修改后的LSTM单元,以确保在不同时间步骤中学到的隐藏表示形式是多样的。我们表明,由此产生的注意力分布提供了更大的透明度,因为它们(i)提供了隐藏状态(ii)的更精确的重要性排名(ii)更好地表明了对模型的预测(III)与基于梯度的归因方法更好相关的单词。人类评估表明,通过我们的模型学到的注意力分布提供了对模型预测的合理解释。我们的代码已在https://github.com/akashkm99/interpretable-crestion上公开提供。
Recent studies on interpretability of attention distributions have led to notions of faithful and plausible explanations for a model's predictions. Attention distributions can be considered a faithful explanation if a higher attention weight implies a greater impact on the model's prediction. They can be considered a plausible explanation if they provide a human-understandable justification for the model's predictions. In this work, we first explain why current attention mechanisms in LSTM based encoders can neither provide a faithful nor a plausible explanation of the model's predictions. We observe that in LSTM based encoders the hidden representations at different time-steps are very similar to each other (high conicity) and attention weights in these situations do not carry much meaning because even a random permutation of the attention weights does not affect the model's predictions. Based on experiments on a wide variety of tasks and datasets, we observe attention distributions often attribute the model's predictions to unimportant words such as punctuation and fail to offer a plausible explanation for the predictions. To make attention mechanisms more faithful and plausible, we propose a modified LSTM cell with a diversity-driven training objective that ensures that the hidden representations learned at different time steps are diverse. We show that the resulting attention distributions offer more transparency as they (i) provide a more precise importance ranking of the hidden states (ii) are better indicative of words important for the model's predictions (iii) correlate better with gradient-based attribution methods. Human evaluations indicate that the attention distributions learned by our model offer a plausible explanation of the model's predictions. Our code has been made publicly available at https://github.com/akashkm99/Interpretable-Attention
