All topics | KAIST Breakthroughs

KAIST
BREAKTHROUGHS

Research Webzine of the KAIST College of Engineering since 2014

Fall 2025 Vol. 25

Computing

GFlux: Trillion-edge scale graph processing with SSD-GPU acceleration

August 26, 2025 hit 1184

Processing large-scale graphs is increasingly vital in fields such as business and AI (e.g., vector database indexing). GFlux enables trillion-edge-scale graph processing on a single machine using SSDs and GPUs, supporting complex analytics such as triangle counting.

Overview of the GFlux architecture, consisting of a task scheduling layer, a task execution layer, and compressed graph storage

Recent advancements in technology for the processing of large-scale graph data have become increasingly critical in business and AI domains. Companies such as Palantir are modeling enterprise data using graphs to enhance operational intelligence and decision-making, while AI applications leverage vector databases employing graph-based indexing to minimize hallucinations in LLMs and achieve high recall rates. Traditionally, handling large-scale graphs involves partitioning data across multiple computers and processing via network communication. However, this approach incurs sharply rising hardware and network costs as graph sizes grow.

Here, we introduce GFlux, a groundbreaking framework that fundamentally addresses these cost challenges by enabling graph computations entirely within a single machine using SSD storage and GPU acceleration. GFlux comprises three innovative layers: a compressed graph storage layer, a task execution layer, and a task scheduling layer, each specifically engineered for efficiently processing trillion-edge-scale graphs.

The compressed graph storage layer employs a novel high-density hierarchical graph format (HGF), representing graphs in a two-dimensional space defined by source and destination vertices. HGF logically partitions the graph into a 2²⁴ x 2²⁴-sized grid, further dividing each block into smaller shards of up to approximately 3GB, taking graph skewness into account. This two-level partitioning drastically compresses the data size by allowing vertex IDs to be represented with a compact 3-byte addressing scheme, as opposed to the traditional 8-byte system. To address memory alignment challenges posed by 3-byte addresses on GPUs, GFlux introduces Flip24, an innovative 3-byte addressing scheme.

The task execution layer defines graph computations such as PageRank and triangle counting as GPU-executable tasks called GTasks. Each GTask includes GPU kernel functions, input shards, vertex attributes (inputs and outputs), and intermediate data. Triangle counting, widely utilized in data analytics and AI, involves identifying and enumerating all triangular connections formed by interconnected vertex triples. The corresponding GTask efficiently computes triangles from shard triples via GPU kernels, supported by an advanced memory management system developed by GFlux to optimize data transfer and management between the main and GPU memory.

Figure 1. Example of triangle counting in an HGF-format graph

The task scheduling layer efficiently schedules and optimizes GTasks for large-scale graph computations. For example, the PageRank algorithm benefits from the column-wise scheduling of block grids to aggregate PageRank values optimally. Conversely, the triangle counting task uses an innovative scheduling method called '3-hop,' which preemptively eliminates shard triples lacking triangles and strategically orders remaining triples to minimize SSD access. Crucially, GFlux provides a theoretical guarantee that this optimization entirely excludes non-triangular shard triples from scheduling.

In practical benchmarks, GFlux demonstrates remarkable performance breakthroughs. It compresses a trillion-edge-scale graph to approximately 4.6TB using the HGF format, nearly halving storage requirements compared to the standard CSR format's 9TB. Moreover, GFlux completed the PageRank algorithm on a trillion-edge-scale graph in approximately 1,300 seconds and successfully processed triangle counting on a graph containing about 70 billion edges in just 1,184 seconds—the largest known scale for triangle counting accomplished on a single machine. Remarkably, previous state-of-the-art systems required approximately 2,000 seconds using 25 CPU servers interconnected via high-speed networks, highlighting GFlux's groundbreaking achievement of nearly doubling the performance with just a single machine.

LIST

Most Popular

Wearable Haptics of Orthotropic Actuation for 3D Spatial Perception in Low-visibility Environment

Lighting the Lunar Night: KAIST Develops First Electrostatic Power Generator for the Moon

How AI Thinks: Understanding Visual Concept Formations in Deep Learning Models

Soft Airless Wheel for A Lunar Exploration Rover Inspired by Origami and Da Vinci Bridge Principles

TwinSpin: A Novel VR Controller Enabling In-Hand Rotation

KAIST BREAKTHROUGHS

GFlux: Trillion-edge scale graph processing with SSD-GPU acceleration

Most Popular

Subscribe to our research webzine

KAIST
BREAKTHROUGHS