Hsinyu Tsai1
IBM Almaden Research Ctr1
Hsinyu Tsai1
IBM Almaden Research Ctr1
Analog non-volatile memory (NVM)-based accelerators for Deep Neural Networks (DNNs) can achieve high-throughput and energy-efficiency by computing multiply-accumulate (MAC) operations using Ohm’s law and Kirchhoff’s current law on arrays of resistive memory devices [1]. In recent years, energy-efficient, weight-stationary MAC operations in analog NVM memory-array “Tiles” were demonstrated in hardware with Phase Change Memory (PCM) devices integrated in the backend of 14-nm CMOS [2, 3]. Competitive end-to-end DNN accuracies can be obtained with the help of hardware aware training, accurate weight programming, and sufficiently linear MAC operations in the analog domain [4].<br/><br/>In this paper, I describe architectural and circuit advances for such Analog NVM-based accelerators and specialized digital compute units, designed to accelerate Transformer, Long- Short-Term-Memory (LSTM), and Convolution Neural Networks (CNNs). A highly heterogeneous and programmable accelerator architecture that takes advantage of a dense and efficient circuit-switched 2D mesh to exchange vectors of neuron-activation over short distances in a massively parallel fashion [5] is presented. Based on a 14-nm inference chip consisting of multiple arrays of PCM devices, the impact of memory materials on the accuracy and performance of these systems will be discussed.<br/><br/>The author would like to thank colleagues at IBM Research Almaden, Yorktown, Albany, Zurich and Tokyo for their contributions to this work and the IBM Research AI HW Center.<br/><br/>[1] G. W. Burr et al. “Ohm’s Law + Kirchhoff’s Current Law = Better AI: Neural- Network Processing Done in Memory with Analog Circuits will Save Energy”. In: IEEE Spectrum 58.12 (2021), pp. 44–49.<br/>[2] P. Narayanan et al. “Fully on-chip MAC at 14nm enabled by accurate row-wise programming of PCM-based weights and parallel vector-transport in duration format”. In: Symposium on VLSI Technology. 2021.<br/>[3] M. Le Gallo et al. “A 64-core mixed-signal in-memory compute chip based on phase- change memory for deep neural network inference”. In: arXiv:2212.02872 (2022).<br/>[4] M. J. Rasch et al. “Hardware-aware training for largescale and diverse deep learning inference workloads using in-memory computing-based accelerators”. In: arXiv preprint arXiv:2302.08469 (2023).<br/>[5] S. Jain et al. “A Heterogeneous and Programmable Compute-In-Memory Accelerator Architecture for Analog-AI Using Dense 2-D Mesh”. In: IEEE Trans. VLSI 31.1 (2023), pp. 114–127.