论文标题
对角状态空间与结构化状态空间一样有效
Diagonal State Spaces are as Effective as Structured State Spaces
论文作者
论文摘要
在顺序数据中对长距离依赖性进行建模是迈向以文本,视觉,音频和视频等多种方式获得人类水平绩效的基本步骤。尽管基于注意力的模型是建模短距离交互作用的一种流行和有效的选择,但它们在需要远程推理的任务上的性能在很大程度上是不足的。在令人兴奋的结果中,Gu等人。 (ICLR 2022)提出了$ \ textIt {结构化状态空间} $(s4)体系结构,在各种模式的多个长期任务上,对最新模型的造成了很大的收益。 S4的核心主张是通过对角线加低级结构的状态矩阵的参数化,从而允许有效计算。在这项工作中,我们表明即使没有较低的等级校正,也可以符合S4的性能,从而假设状态矩阵是对角线的。我们的$ \ textIt {对角线状态空间} $(DSS)模型与S4在远程竞技场任务,语音命令数据集上的语音分类相匹配,同时在概念上更简单明了实现。
Modeling long range dependencies in sequential data is a fundamental step towards attaining human-level performance in many modalities such as text, vision, audio and video. While attention-based models are a popular and effective choice in modeling short-range interactions, their performance on tasks requiring long range reasoning has been largely inadequate. In an exciting result, Gu et al. (ICLR 2022) proposed the $\textit{Structured State Space}$ (S4) architecture delivering large gains over state-of-the-art models on several long-range tasks across various modalities. The core proposition of S4 is the parameterization of state matrices via a diagonal plus low rank structure, allowing efficient computation. In this work, we show that one can match the performance of S4 even without the low rank correction and thus assuming the state matrices to be diagonal. Our $\textit{Diagonal State Space}$ (DSS) model matches the performance of S4 on Long Range Arena tasks, speech classification on Speech Commands dataset, while being conceptually simpler and straightforward to implement.
