All topics | KAIST Breakthroughs

KAIST
BREAKTHROUGHS

Research Webzine of the KAIST College of Engineering since 2014

Spring 2025 Vol. 24

Electronics

TensorDIMM: An AI accelerator for personalized recommendation algorithms based on processing-in-memory

July 27, 2023 hit 158

TensorDIMM: An AI accelerator for personalized recommendation algorithms based on processing-in-memory

TensorDIMM is the first architectural solution tackling sparse embedding layers of personalized recommendation systems. Our solution is based on a practical processing-in-memory architecture that fundamentally addresses the memory capacity and bandwidth challenges of sparse embedding layers widely employed in recommendation systems, natural language processing, and speech recognition.

Article | Fall 2020

Personalized recommendation systems are becoming increasingly popular in today’s datacenters as they power numerous application domains such as online advertisement, movie/music recommendations, e-commerce, and news feeds. As such, accelerating recommendation systems for high performance and high energy-efficiency is becoming increasingly important for hyperscalers such as Google, Facebook, Amazon, and Microsoft This is occurring because the performance of recommendation systems are directly correlated with the hyperscaler’s revenue, rendering the overall quality-of-service (QoS) provided to end consumers vital. A key challenge in deploying recommendation systems, however, is the excessive memory footprint and limited memory bandwidth available for executing recommendation algorithms, incurring several tens to thousands of GBs of memory usage over several, large “embedding tables”.

An “embedding” is a learned continuous vector representation in a low-dimensional latent space, which is projected from a personalized categorical feature (e.g., preferred movie genre, or food). A recommendation system manages the embeddings as a formation of a table, called an embedding table, and ultimately predicts the possibility of a certain event by combining the embedding features [Figure 1].

Figure 1. The usage of embeddings in a recommendation system

For example, YouTube or Netflix predicts the probability of a user watching a recommended video clip. Considering that there are countless categorical features and users, the recommendation system can have plenty of embedding tables, which can amount to several tens to thousands of GBs. This is the reason why the datacenters suffer from the excessive memory footprint of the recommendation system.

Prof. Minsoo Rhu and his research team at the school of Electrical Engineering at KAIST have developed TensorDIMM, the first architectural solution tackling the memory capacity and bandwidth challenges of embedding layers employed in personalized recommendation systems [Figure 2].

The team observed that the tensor operation applied to the contents fetched from the embedding tables are simple, element-wise vector operations. Because of the way computer systems are built these days, existing computing platforms must fetch all the embedding vectors from the external, off-chip memory into the internals of the processor to execute the operation, easily becoming bottlenecked by the low off-chip data transfer and memory bandwidth.

TensorDIMM takes a completely different approach, employing a processing-in-memory (PIM) solution where the tensor operations targeting embedding vectors are conducted directly “near/inside” the memory modules, rather than fetching them all the way to the main processor die.

This allows the effective memory bandwidth utilized in conducting tensor operations over embeddings to be drastically improved, fundamentally addressing the memory bandwidth obstacles of sparse embedding layers. Additionally, TensorDIMM employs a disaggregated memory based system architectural solution, allowing the available memory capacity to be drastically enhanced.

Figure 2. High-level overview of the proposed system

This research was published as a part of a conference proceeding paper at The 52nd IEEE/ACM International Symposium on Microarchitecture (MICRO-52) under the title, “TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning”.

The work was subsequently selected for IEEE Micro Top Picks – Honorable Mention (“IEEE Micro – Special Issue on Top Picks from the 2019 Computer Architecture Conferences”), which acknowledges research papers having high novelty and potential for long-term impact, the first of its kind from KAIST.

LIST

Most Popular

When and why do graph neural networks become powerful?

Smart Warnings: LLM-enabled personalized driver assistance

Extending the lifespan of next-generation lithium metal batteries with water

Professor Ki-Uk Kyung’s research team develops soft shape-morphing actuator capable of rapid 3D transformations

Oxynizer: Non-electric oxygen generator for developing countries

KAIST BREAKTHROUGHS

TensorDIMM: An AI accelerator for personalized recommendation algorithms based on processing-in-memory

TensorDIMM: An AI accelerator for personalized recommendation algorithms based on processing-in-memory

Most Popular

Subscribe to our research webzine

KAIST
BREAKTHROUGHS