1. Autoscheduling for sparse tensor algebra with an asymptotic cost model
2. Xuhao Chen. 2018. Escoin: Efficient sparse convolutional neural network inference on gpus. arXiv preprint arXiv:1802.10280 (2018).
3. Kazem Cheshmi, Michelle Mills Strout, and Maryam Mehri Dehnavi. 2023. Runtime Composition of Iterations for Fusing Loop-carried Sparse Dependence. In SC23: International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 1--14.
4. Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).
5. Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019).