Research Webzine of the KAIST College of Engineering since 2014
Spring 2025 Vol. 24
Scalable big graph mining using distributed systems creates new opportunities for the discovery of interesting patterns and anomalies on very large graphs that could not have been analyzed before.
Article | Spring 2014
Graphs are ubiquitous, with such examples as computer networks, social networks, mobile call networks, biological networks, citation networks, protein regulation networks, and the World Wide Web. Spurred by the lower cost of disk storage, the success of social networking sites (e.g. Facebook, Twitter, and Google+) and Web 2.0 applications, and the high availability of data sources, graph data are being generated at an unparalleled rate.
They are now measured in terabytes and are heading toward petabytes, with billions of nodes and edges.
Mining such big graphs helps us find patterns and anomalies that lead to many interesting applications including fraud detection, cyber security, and social network analysis. The research team of Prof. U Kang in the Department of Computer Science is working on scalable big graph mining, which includes two components: scalable algorithms and discoveries on real world graphs.
Scalable Algorithms. Traditional graph algorithms assume that the input graph fits in the memory or disks of a single machine. However, the recent growth of the sizes of graphs debunks this assumption. Since single machine algorithms are not tractable for handling big graphs, Prof. U Kang’s team is working on designing and developing scalable algorithms and distributed platforms for mining and managing big graphs. Prof. U Kang is the main author of the award-winning Pegasus graph mining software, which includes various large-scale graph mining algorithms including PageRank, Random Walk with Restart (RWR), diameter estimation, connected components, eigensolver, and tensor analysis.
Discoveries on Real World Graphs. The developed scalable algorithms lead to the analysis of large real world graphs and interesting discoveries of patterns and anomalies that could not have been found before. One of the most interesting discoveries is the seven-degrees of separation in one of the largest publicly-available web graphs with ~7 billion edges. The discovery suggests that the so-called “small world” phenomenon exists on the web, and the distance between any two webpages is much smaller than people expect. Another interesting discovery is the existence of suspicious adult advertisers in the “Twitter who-followed-whom social network in 2009″ graph with 3 billion edges. In the scatter plot of degree vs. triangles of Twitter accounts, some famous U.S. politicians have mildly-connected followers, while adult advertisers have tightly-connected followers, creating many triangles. The reason is that adult accounts are often from the same provider, and the accounts follow each other to boost their rankings, thereby creating many cliques containing triangles.
By U Kang, Department of Computer Science
When and why do graph neural networks become powerful?
Read moreSmart Warnings: LLM-enabled personalized driver assistance
Read moreExtending the lifespan of next-generation lithium metal batteries with water
Read moreProfessor Ki-Uk Kyung’s research team develops soft shape-morphing actuator capable of rapid 3D transformations
Read moreOxynizer: Non-electric oxygen generator for developing countries
Read more