Research Webzine of the KAIST College of Engineering since 2014
Fall 2025 Vol. 25Processing large-scale graphs is increasingly vital in fields such as business and AI (e.g., vector database indexing). GFlux enables trillion-edge-scale graph processing on a single machine using SSDs and GPUs, supporting complex analytics such as triangle counting.

Recent advancements in technology for the processing of large-scale graph data have become increasingly critical in business and AI domains. Companies such as Palantir are modeling enterprise data using graphs to enhance operational intelligence and decision-making, while AI applications leverage vector databases employing graph-based indexing to minimize hallucinations in LLMs and achieve high recall rates. Traditionally, handling large-scale graphs involves partitioning data across multiple computers and processing via network communication. However, this approach incurs sharply rising hardware and network costs as graph sizes grow.
Here, we introduce GFlux, a groundbreaking framework that fundamentally addresses these cost challenges by enabling graph computations entirely within a single machine using SSD storage and GPU acceleration. GFlux comprises three innovative layers: a compressed graph storage layer, a task execution layer, and a task scheduling layer, each specifically engineered for efficiently processing trillion-edge-scale graphs.
The compressed graph storage layer employs a novel high-density hierarchical graph format (HGF), representing graphs in a two-dimensional space defined by source and destination vertices. HGF logically partitions the graph into a 224 x 224-sized grid, further dividing each block into smaller shards of up to approximately 3GB, taking graph skewness into account. This two-level partitioning drastically compresses the data size by allowing vertex IDs to be represented with a compact 3-byte addressing scheme, as opposed to the traditional 8-byte system. To address memory alignment challenges posed by 3-byte addresses on GPUs, GFlux introduces Flip24, an innovative 3-byte addressing scheme.
The task execution layer defines graph computations such as PageRank and triangle counting as GPU-executable tasks called GTasks. Each GTask includes GPU kernel functions, input shards, vertex attributes (inputs and outputs), and intermediate data. Triangle counting, widely utilized in data analytics and AI, involves identifying and enumerating all triangular connections formed by interconnected vertex triples. The corresponding GTask efficiently computes triangles from shard triples via GPU kernels, supported by an advanced memory management system developed by GFlux to optimize data transfer and management between the main and GPU memory.

The task scheduling layer efficiently schedules and optimizes GTasks for large-scale graph computations. For example, the PageRank algorithm benefits from the column-wise scheduling of block grids to aggregate PageRank values optimally. Conversely, the triangle counting task uses an innovative scheduling method called '3-hop,' which preemptively eliminates shard triples lacking triangles and strategically orders remaining triples to minimize SSD access. Crucially, GFlux provides a theoretical guarantee that this optimization entirely excludes non-triangular shard triples from scheduling.
In practical benchmarks, GFlux demonstrates remarkable performance breakthroughs. It compresses a trillion-edge-scale graph to approximately 4.6TB using the HGF format, nearly halving storage requirements compared to the standard CSR format's 9TB. Moreover, GFlux completed the PageRank algorithm on a trillion-edge-scale graph in approximately 1,300 seconds and successfully processed triangle counting on a graph containing about 70 billion edges in just 1,184 seconds—the largest known scale for triangle counting accomplished on a single machine. Remarkably, previous state-of-the-art systems required approximately 2,000 seconds using 25 CPU servers interconnected via high-speed networks, highlighting GFlux's groundbreaking achievement of nearly doubling the performance with just a single machine.
A New solution enabling soft growing robots to perform a variety of tasks in confined spaces
Read moreAI-Designed carbon nanolattice: Feather-light, steel-strong
Read moreDevelopment of a compact high-resolution spectrometer using a double-layer disordered metasurface
Read moreWearable hyperspectral photoplethysmography for the continuous monitoring of exercise-induced hypertension
Read moreSmarter AI through AI-generated feedback
Read more