NEM-GNN: DAC/ADC-less, Scalable, Reconfigurable, Graph and Sparsity-Aware Near-Memory Accelerator for Graph Neural Networks-Reference-Cited by-同舟云学术

NEM-GNN: DAC/ADC-less, Scalable, Reconfigurable, Graph and Sparsity-Aware Near-Memory Accelerator for Graph Neural Networks

Published:2024-05-21 Issue:2 Volume:21 Page:1-26
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Sundara Raman Siddhartha Raman¹^ORCID,John Lizy¹^ORCID,Kulkarni Jaydeep P.¹^ORCID

Affiliation:

1. The University of Texas at Austin, Austin, USA

Abstract

Graph neural networks (GNNs) are of great interest in real-life applications such as citation networks and drug discovery owing to GNN’s ability to apply machine learning techniques on graphs. GNNs utilize a two-step approach to classify the nodes in a graph into pre-defined categories. The first step uses a combination kernel to perform data-intensive convolution operations with regular memory access patterns. The second step uses an aggregation kernel that operates on sparse data having irregular access patterns. These mixed data patterns render CPU/GPU-based compute energy-inefficient. Von Neumann based accelerators like AWB-GCN [ 7 ] suffer from increased data movement, as the data-intensive combination requires large data movement to/from memory to perform computations. ReFLIP [ 8 ] performs resistive random access memory based in-memory (PIM) compute to overcome data movement costs. However, ReFLIP suffers from increased area requirement due to dedicated accelerator arrangement, and reduced performance due to limited parallelism and energy due to fundamental issues in ReRAM-based compute. This article presents a scalable (non-exponential storage requirement), DAC/ADC-less PIM-based combination, with (i) early compute termination and (ii) pre-compute by reconfiguring SOC components. Graph and sparsity-aware near-memory aggregation using the proposed compute-as-soon-as-ready (CAR) broadcast approach improves performance and energy further. NEM-GNN achieves ∼80–230x, ∼80–300x, ∼850–1,134x, and ∼7–8x improvement over ReFLIP, in terms of performance, throughput, energy efficiency, and compute density.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3652607

Reference29 articles.

1. Pt/Cu:ZnO/Nb:STO memristive dual port for cache memory applications

2. Geometric Deep Learning: Going beyond Euclidean data

3. Crossbar based Processing in Memory Accelerator Architecture for Graph Convolutional Networks

4. GCIM: Toward Efficient Processing of Graph Convolutional Networks in 3D-Stacked Memory

5. PEDAL: A Power Efficient GCN Accelerator with Multiple DAtafLows

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SPARK: Sparsity Aware, Low Area, Energy-Efficient, Near-memory Architecture for Accelerating Linear Programming Problems;2025 IEEE International Symposium on High Performance Computer Architecture (HPCA);2025-03-01