About Past Issues Editorial Board

KAIST
BREAKTHROUGHS

Research Webzine of the KAIST College of Engineering since 2014

Spring 2025 Vol. 24
Computing

Scalable big graph mining enables the discovery of patterns and anomalies

July 27, 2023   hit 85

Scalable big graph mining enables the discovery of patterns and anomalies

 

Scalable big graph mining using distributed systems creates new opportunities for the discovery of interesting patterns and anomalies on very large graphs that could not have been analyzed before.

 

Article  |  Spring 2014

 

 

Graphs are ubiquitous, with such examples as computer networks, social networks, mobile call networks, biological networks, citation networks, protein regulation networks, and the World Wide Web. Spurred by the lower cost of disk storage, the success of social networking sites (e.g. Facebook, Twitter, and Google+) and Web 2.0 applications, and the high availability of data sources, graph data are being generated at an unparalleled rate.

They are now measured in terabytes and are heading toward petabytes, with billions of nodes and edges.

Mining such big graphs helps us find patterns and anomalies that lead to many interesting applications including fraud detection, cyber security, and social network analysis. The research team of Prof. U Kang in the Department of Computer Science is working on scalable big graph mining, which includes two components: scalable algorithms and discoveries on real world graphs.

Scalable Algorithms. Traditional graph algorithms assume that the input graph fits in the memory or disks of a single machine. However, the recent growth of the sizes of graphs debunks this assumption. Since single machine algorithms are not tractable for handling big graphs, Prof. U Kang’s team is working on designing and developing scalable algorithms and distributed platforms for mining and managing big graphs. Prof. U Kang is the main author of the award-winning Pegasus graph mining software, which includes various large-scale graph mining algorithms including PageRank, Random Walk with Restart (RWR), diameter estimation, connected components, eigensolver, and tensor analysis.

Discoveries on Real World Graphs. The developed scalable algorithms lead to the analysis of large real world graphs and interesting discoveries of patterns and anomalies that could not have been found before. One of the most interesting discoveries is the seven-degrees of separation in one of the largest publicly-available web graphs with ~7 billion edges. The discovery suggests that the so-called “small world” phenomenon exists on the web, and the distance between any two webpages is much smaller than people expect. Another interesting discovery is the existence of suspicious adult advertisers in the “Twitter who-followed-whom social network in 2009″ graph with 3 billion edges. In the scatter plot of degree vs. triangles of Twitter accounts, some famous U.S. politicians have mildly-connected followers, while adult advertisers have tightly-connected followers, creating many triangles. The reason is that adult accounts are often from the same provider, and the accounts follow each other to boost their rankings, thereby creating many cliques containing triangles.

img01-R04-a0003

 

By U Kang, Department of Computer Science