Dual-Mode 2T1C DRAM Process-In-Memory Architecture for Boolean and MAC Operations
Yerim An, Honggu Kim, Derac Son, Hao Yu, Yong Shim
IF 3.6
IEEE Access
With the increasing demand for intelligent memory, the conventional memory system is more and more equipped with computational logic to support simple arithmetic and Boolean operations required by many real-world applications. Such a trend, called ‘Process-In-Memory’ architecture, tries to utilize most of the memory candidates ranging from the conventional charge-based memory such as SRAM, DRAM, and Flash to the emerging memory devices such as RRAM, PRAM and MRAM. From the application perspective, many researchers are putting efforts to develop efficient memory peripherals with CMOS circuits to support one of two operations: 1) Boolean function, and 2) simple arithmetic operations such as multiplication and accumulation (addition), which is typically referred to as MAC operation. However, there are not many previous works that support both operations in a single system. In this article, we propose a 2T1C DRAM-based PIM architecture that supports dual-mode operation with minimal additional hardware. The proposed architecture has been fabricated using a commercial 65nm CMOS technology and successfully proves that the target operations are performed with a reasonably good accuracy.
All Stochastic-Spiking Neural Network (AS-SNN): Noise Induced Spike Pulse Generator for Input and Output Neurons With Resistive Synaptic Array
Honggu Kim, Yoshimori An, Min-Chul Kim, Gyeong-Chan Heo, Yong Shim
IF 4.9
IEEE Transactions on Circuits & Systems II Express Briefs
Spiking neural network (SNN) based mixed-signal neuromorphic hardware gives high benefit in terms of speed and energy efficiency compared to conventional computing platform, thanks to its energy efficient data processing nature. However, on-chip realization of Poisson spike train to represent spike-encoded data has not yet fully achieved. Furthermore, the analog circuit components in mixed-signal neuromorphic hardwares are prone to variations which might lead to accuracy drop in SNN applications. In this brief, we demonstrated robust noise induced spike pulse generator for on-chip realization of Poisson spike train. The stochastic sigmoid neuron developed in our work exhibits better robustness than LIF neurons towards diverse RRAM device variation factors: 1) Random Telegraph Noise (RTN), 2) Stuck-At-Faults (SAFs) and 3) Endurance failures, guaranteeing robust SNN application.
IEEE Transactions on Circuits & Systems II Express Briefs
This brief presents the on-chip background offset and timing-skew calibration of the 1-then-2b/cycle time-interleaved successive-approximation-register analog-to-digital converter (TI SAR ADC). For timing-skew between sub-ADC’s sampling clocks, a comparator offset-based window detector (WD) is used to adjust the clock edge misalignment. In addition, comparator offset calibration is considered both in terms of 1) global offset (between the offset-free reference comparator and the local reference comparator in each sub-ADC) and 2) local offset (between the local reference comparator and the rest of the comparators in the same sub-ADC). The proposed calibration sufficiently suppresses noise floor and spurs, and all calibrations are performed in the background without interfering with normal ADC operation. The prototype 5-way TI SAR ADC is fabricated in a 28 nm CMOS process and occupies a 0.03 mm2 area including on-chip calibration. With the proposed calibration, the prototype achieves SNDR of 40 dB at Nyquist input and consumes 7.57 mW, leading to the Walden figure of merit ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${\text {FoM}_{W}}$ </tex-math></inline-formula> ) of 37.2 fJ/conversion-step.
Dual-Mode 2T1C DRAM Process-In-Memory Architecture for Boolean and MAC Operations
Yerim An, Honggu Kim, Derac Son, Hao Yu, Yong Shim
IF 3.6
IEEE Access
With the increasing demand for intelligent memory, the conventional memory system is more and more equipped with computational logic to support simple arithmetic and Boolean operations required by many real-world applications. Such a trend, called ‘Process-In-Memory’ architecture, tries to utilize most of the memory candidates ranging from the conventional charge-based memory such as SRAM, DRAM, and Flash to the emerging memory devices such as RRAM, PRAM and MRAM. From the application perspective, many researchers are putting efforts to develop efficient memory peripherals with CMOS circuits to support one of two operations: 1) Boolean function, and 2) simple arithmetic operations such as multiplication and accumulation (addition), which is typically referred to as MAC operation. However, there are not many previous works that support both operations in a single system. In this article, we propose a 2T1C DRAM-based PIM architecture that supports dual-mode operation with minimal additional hardware. The proposed architecture has been fabricated using a commercial 65nm CMOS technology and successfully proves that the target operations are performed with a reasonably good accuracy.
All Stochastic-Spiking Neural Network (AS-SNN): Noise Induced Spike Pulse Generator for Input and Output Neurons With Resistive Synaptic Array
Honggu Kim, Yoshimori An, Min-Chul Kim, Gyeong-Chan Heo, Yong Shim
IF 4.9
IEEE Transactions on Circuits & Systems II Express Briefs
Spiking neural network (SNN) based mixed-signal neuromorphic hardware gives high benefit in terms of speed and energy efficiency compared to conventional computing platform, thanks to its energy efficient data processing nature. However, on-chip realization of Poisson spike train to represent spike-encoded data has not yet fully achieved. Furthermore, the analog circuit components in mixed-signal neuromorphic hardwares are prone to variations which might lead to accuracy drop in SNN applications. In this brief, we demonstrated robust noise induced spike pulse generator for on-chip realization of Poisson spike train. The stochastic sigmoid neuron developed in our work exhibits better robustness than LIF neurons towards diverse RRAM device variation factors: 1) Random Telegraph Noise (RTN), 2) Stuck-At-Faults (SAFs) and 3) Endurance failures, guaranteeing robust SNN application.
IEEE Transactions on Circuits & Systems II Express Briefs
This brief presents the on-chip background offset and timing-skew calibration of the 1-then-2b/cycle time-interleaved successive-approximation-register analog-to-digital converter (TI SAR ADC). For timing-skew between sub-ADC’s sampling clocks, a comparator offset-based window detector (WD) is used to adjust the clock edge misalignment. In addition, comparator offset calibration is considered both in terms of 1) global offset (between the offset-free reference comparator and the local reference comparator in each sub-ADC) and 2) local offset (between the local reference comparator and the rest of the comparators in the same sub-ADC). The proposed calibration sufficiently suppresses noise floor and spurs, and all calibrations are performed in the background without interfering with normal ADC operation. The prototype 5-way TI SAR ADC is fabricated in a 28 nm CMOS process and occupies a 0.03 mm2 area including on-chip calibration. With the proposed calibration, the prototype achieves SNDR of 40 dB at Nyquist input and consumes 7.57 mW, leading to the Walden figure of merit ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${\text {FoM}_{W}}$ </tex-math></inline-formula> ) of 37.2 fJ/conversion-step.
Low Power CMOS Stochastic Bit Based Ising Machine and Its Application to Graph Coloring Problem
Honggu Kim, Derac Son, Yerim An, Yong Shim
Electronics Letters
ABSTRACT The Ising spin model is an efficient method for solving combinatorial optimization problems (COPs) but faces challenges in conventional Von‐Neumann architectures due to high computational costs, especially with the growing data volume in the IoT era. To address this problem, we proposed low power CMOS stochastic bit based Ising machine to efficiently compute COPs. By adopting compute‐in‐memory (CIM) approach for parallel spin computation, we achieved energy efficient spin computing. Furthermore, we harnessed the inherent randomness of CMOS stochastic bit to prevent Ising computing process from being stuck into local minima, effectively mitigating the power penalty associated with the random number generators (RNGs) in the conventional CMOS based Ising machines. We demonstrated the feasibility of our design by solving NP‐complete graph coloring problem with four vertices and three colors using TSMC 65 nm GP process. Moreover, the proposed CMOS stochastic bit based spin unit consumes the lowest power/spin among the state‐of‐the‐art Ising machine researches, with power/spin of 1.07 and energy/spin of 107 fJ.
FINEX Process: Coal Briquette Quality Prediction and Automatic Control using CNN-LSTM and YOLOv11
Indong Kim, Woo-Il Park, Yong Shim, Taehyeok Kim, Soohee Han
IF 1.8
ISIJ International
Following the global consensus on responding to climate change, reducing coal usage has become a critical task given that it is primary contributor to greenhouse gas (GHG) emissions in the steel industry. Given this context, the development of graphics processing units has led to an increasing research interest in applying various deep-learning technologies to enhance process efficiency in steel manufacturing operations. To this end, this study proposes an automatic control logic for supplying coal to rollers during briquette production. The proposed system employs convolutional neural network-long short-term memory for time-series prediction and you look only once for object detection to stabilize coal briquette quality in the FINEX process, replacing the existing manually controlled task in the hot metal production process. Actual process applications were verified by using these models to detect roller currents used as key control factors and briquettes produced below the roller and establish a correlation with compressive strength, which is the main quality indicator of briquettes. Based on these models, an automatic control logic was constructed and tested in actual processes, and the obtained results confirmed a quality improvement effect of ~30 %. The results of this study are expected to contribute to improving coal usage efficiency and reducing GHG emissions in the FINEX process.
Union SRAM: PVT Variation Auto-Compensated, Bit Precision Configurable Current Mode 8T SRAM in Memory MAC Macro
Honggu Kim, Yerim An, Ryunyeong Kim, Sun Young Kim, Yong Shim
IF 3.6
IEEE Access
SRAM-based Compute-In-Memory (CIM) has two main paradigms: Digital domain and Analog domain, where both have been extensively explored to overcome the von-Neumann bottleneck and enhance energy efficiency. Digital CIM offers robustness and dynamic bit-precision through bit-wise and bit-serial computing, but suffers from limited throughput due to multi-cycle operations and degraded area density due to large hardware footprints. In contrast, Analog CIM offers the significantly improved throughput and area density for the analog computing nature and simple logic structure. However, the weight and input data are constrained by fixed bit-precision, limiting the flexibility in DNN applications. Additionally, Analog CIM is susceptible to process, voltage, and temperature (PVT) variations, resulting in potential accuracy degradation. We present a solution to the limitations of analog domain SRAM CIM in dynamic bit-precision configurability and PVT variation vulnerability. Our proposed 4b/8b bit-precision configurable analog current-mode 8T SRAM CIM architecture enhances DNN application flexibility. We also introduce PVT variation auto-compensation scheme, effectively maintaining precise computing accuracy of the analog domain CIM. Post-layout simulations confirm the architecture’s efficacy, achieving a throughput of 170 to 793.4 GOPs, area efficiency of 0.227 to 1.06 TOPs/mm2, and energy efficiency of 5.1 to 23.76 TOPs/W. Additionally, software-level simulation on the CIFAR-10 dataset demonstrates 95.02 percent classification accuracy.
Coupled 7T 1C SRAM based in-memory computing architecture with gain/offset error auto-compensated SAR ADC
Honggu Kim, Yerim An, Yong Shim
Charge domain SRAM-based Process-In-Memory (PIM) has been tremendously researched for recent years to improve throughput and energy efficiency in data intensive neural network applications. However, there are two main issues that need to be resolved in charge domain SRAM-based PIM architectures, 1) minimizing area overhead while supporting multi bit Multiply-and-Accumulate (MAC) operation, 2) addressing the MAC operation gain error and offset error due to PVT-varied input Digital-to-Analog data Converters (DACs) and in-array coupling noises, respectively.
8T-SRAM Based Process-In-Memory (PIM) System With Current Mirror for Accurate MAC Operation
Keun-Yong Chung, Honggu Kim, Yerim An, Kiho Seong, Dong-Hyun Shin, Kwang‐Hyun Baek, Yong Shim
IF 3.6
IEEE Access
Process-in-memory (PIM) is an emerging computing paradigm to overcome the energy bottleneck inherent in conventional computing platform. While PIM utilizes several types of memory elements, SRAM based PIM has been researched extensively for its high scalability and feasibility by using CMOS process technology. In this work, we proposed 8T SRAM based process-in-memory system with current mirror for accurate MAC operation. To resolve the nonlinearity issue inherent in current-based SRAM MAC macro, we utilized cascaded current mirror. Furthermore, to enable more precise MAC operation, we allocated RWL timing pulses which drive 8T SRAM bitcells in the middle of timing duration. Our PIM architecture realize 4 bit input, 4 bit weight and 4 bit output precisions. Prototype chip has been fabricated using TSMC 65nm GP process technology. Measurement result has proved the MAC operation linearity of designed 8T SRAM based PIM system, and its throughput and energy efficiency with 0.29 GOPs and 1.05 TOPs/W. Software level analysis proved that our design can achieve upto 98.02 percent accuracy for MNIST dataset classification and 94.93 percent accuracy for CIFAR-10 dataset classification.
Recent healthcare systems based on human body communication (HBC) require human interaction sensors. Due to the conductive properties of the human body, capacitive sensors are most widely known and are applied to many electronic gadgets for communication. Capacitance fluctuations due to the fact of human interaction are typically converted to voltage levels using some analog circuits, and then analog-to-digital converters (ADCs) are used to convert analog voltages into digital codes for further processing. However, signals detected by human touch naturally contain large noise, and an active analog filter that consumes a lot of power is required. In addition, the inclusion of ADCs causes the system to use a large area and amount of power. The proposed structure adopts a digital-based moving average filter (MAF) that can effectively operate as a low-pass filter (LPF) instead of a large-area and high-power consumption analog filter. In addition, the proposed ∆C detection algorithm can distinguish between human interaction and object interaction. As a result, two individual digital signals of touch/release and movement can be generated, and the type and strength of the touch can be effectively expressed without the help of an ADC. The prototype chip of the proposed capacitive sensing circuit was fabricated with commercial 65 nm CMOS process technology, and its functionality was fully verified through testing and measurement. The prototype core occupies an active area of 0.0067 mm2, consumes 7.5 uW of power, and has a conversion time of 105 ms.
Distance computation between two input vectors is a widely used computing unit in several pattern recognition, signal processing and neuromorphic applications. However, the implementation of such a functionality in conventional CMOS design requires expensive hardware and involves significant power consumption. Even power-efficient current-mode analog designs have proved to be slower and vulnerable to variations. In this paper, we propose an approximate mixed-signal design for the distance computing core by noting the fact that a vast majority of the signal processing applications involving this operation are resilient to small approximations in the distance computation. The proposed mixed-signal design is able to interface with external digital CMOS logic and simultaneously exhibit fast operating speeds. Another important feature of the proposed design is that the computing core is able to compute two variants of the distance metric, namely the (i) Euclidean distance squared (L22 norm) and (ii) Manhattan distance (L1 norm). The performance of the proposed design was evaluated on a standard K-means clustering algorithm on the “Iris flower dataset”. The results indicate a throughput of 6 ns per classification and ∼2.3× lower energy consumption in comparison to a synthesized digital CMOS design in commercial 45 nm CMOS technology.