About Past Issues Editorial Board

KAIST
BREAKTHROUGHS

Research Webzine of the KAIST College of Engineering since 2014

Spring 2025 Vol. 24
Computing

A fast n-ary join query processing technology for modern complex queries

July 27, 2023   hit 113

A fast n-ary join query processing technology for modern complex queries

 

Existing commercial DBMSs tend to fail in processing modern complex queries due to memory shortage or time out. They rely on binary join processing. SPRINTER improves the performance of such queries by up to 5-20 times without failure through n-ary join processing.

 

Article | Spring 2021

 

 

DBMSs are widely used in various fields including business analytics, artificial intelligence, and bioinformatics. As the applications in those fields become increasingly complex, the queries used in the applications also increase in their complexity. The queries are described in SQL, the standard language for querying a database. The SQL queries usually contain one or more joins between two tables. There are two types of joins: the PK-FK join and the FK-FK join. PK means a primary key column of a table that has unique values (e.g., IDs), while FK is a foreign key column that has duplicated values (e.g., names). PK-FK join produces a relatively small number of joined tuples, while FK-FK join a huge number of joined tuples. A complex query contains one or more FK-FK joins, and 26% queries of TPC-DS, the standard benchmark for DBMS, are complex queries.

The existing DBMSs generate a query plan of left-deep tree (Figure 1(a)) and process the plan in an operator-at-a-time way or in a pipeline way. Either way, a series of joins including FK-FK significantly amplifies the number of intermediate tuples to be processed, and so, query processing tends to fail due to memory shortage or time out. Over the past several decades, DBMSs only use the plan of the left-deep tree since the number of possible plans of general trees grows exponentially as the number of tables increases. It is hard to find an optimal plan among them, and at the same time, there is no efficient join operator that takes three or more tables as input. SPRINTER solves this problem by proposing a new query planning method and a new n-ary join operator. It finds a query plan that has an n-ary join operator as a root and several small left-deep trees (called subtrees) as the children of the root (Figure 1(b)). In particular, it causes each subtree to have no FK-FK joins and instead allows all FK-FK joins to be processed together at the root by using the n-ary join operator. Each subtree is easily processed, and the n-ary join operator can efficiently process FK-FK joins among an arbitrary number of tables without memory shortage problem. As a result, SPRINTER significantly improves the query performance of the existing DBMSs.

SPRINTER processes an n-ary join operator using the so-called worst-case optimal join algorithm. The tables joined at the root of a plan have one or more FK column(s). For instance, there are five FK-FK joins in the query Q17 of TPC-DS, where CS, SR, and SS are tables, and where item, cust (meaning customer), and ticket are columns (Figure 2(a)). SPRINTER determines a good global order among the columns, e.g., i (item) < c (cust) < t (ticket), and sorts each table according to the order, e.g., the tuples of CS are sorted first by i and next by c (Figure 2(b)). Then, it merge-joins the sorted tables while doing binary search on their subranges.

 

Tables in the root of the query plan and the worst-case optimal join among them.

 

 

In TPC-DS, SPRINTER outperformed the existing commercial DBMSs by up to 5-20 times. This research was published at the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD 2020) under the title, “SPRINTER: A Fast n-ary Join Query Processing Method for Complex OLAP Queries”.