论文标题
SparSetir:深度学习中稀疏编译的可综合抽象
SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
论文作者
论文摘要
稀疏的张量正在迅速成为现代深度学习工作负载的关键组成部分。但是,开发高性能的稀疏运营商可能很困难和乏味,现有的供应商库无法满足新运营商的不断升级要求。稀疏张量编译器简化了操作员的开发,但是深度学习的有效稀疏汇编仍然具有挑战性,因为单个稀疏格式无法最大程度地提高硬件效率,并且单次弹奏编译器无法跟上最新的硬件和系统进步。在本文中,我们观察到,解决这两个挑战的关键是利用可组合的格式和可组合的转换。我们提出了SparSetir,这是一种稀疏的张量汇编抽象,可为深度学习工作负载提供可合理的格式和可合理的转换。 Sparsetir在这些可组合组件上构建了一个搜索空间,以进行性能调整。通过这些改进,Sparsetir获得了单个操作员的GPU上的稳定性加速与供应商库:GNN操作员的1.20-2.34 X,稀疏注意操作员的1.05-2.98倍,稀疏卷积运算符的0.56-7.45x。 Sparsetir还通过1.08-1.52X加速了端到端的GNN,用于图形训练,而RGCN推理则加速了4.20-40.18x。
Sparse tensors are rapidly becoming critical components of modern deep learning workloads. However, developing high-performance sparse operators can be difficult and tedious, and existing vendor libraries cannot satisfy the escalating demands from new operators. Sparse tensor compilers simplify the development of operators, but efficient sparse compilation for deep learning remains challenging because a single sparse format cannot maximize hardware efficiency, and single-shot compilers cannot keep up with latest hardware and system advances. In this paper, we observe that the key to addressing both these challenges is to leverage composable formats and composable transformations. We propose SparseTIR, a sparse tensor compilation abstraction that offers composable formats and composable transformations for deep learning workloads. SparseTIR constructs a search space over these composable components for performance tuning. With these improvements, SparseTIR obtains consistent performance speedups vs vendor libraries on GPUs for single operators: 1.20-2.34x for GNN operators, 1.05-2.98x for sparse attention operators, and 0.56-7.45x for sparse convolution operators. SparseTIR also accelerates end-to-end GNNs by 1.08-1.52x for GraphSAGE training, and 4.20-40.18x for RGCN inference.
