About Past Issues Editorial Board

KAIST
BREAKTHROUGHS

Research Webzine of the KAIST College of Engineering since 2014

Fall 2025 Vol. 25
Computing

Method that reduces real-world AI training costs and its application to control infectious diseases

August 26, 2025   hit 503

To address prevalent data quality issues in real-world AI training, this research focuses on three major challenges. First, it automatically detects and corrects label errors during training, reducing the need for manual data preprocessing. The related paper has had a significant academic impact, being cited over 1,200 times in the past two years. Second, it automatically infers missing labels in time-series data, significantly lowering the cost of manual label acquisition. Third, it removes redundant data and selects a core set of informative samples, achieving comparable model performance while reducing the training time by up to 90%. These technologies have been successfully applied to real-world social problems such as infectious disease prediction and economic impact forecasting. They have been granted patents in both South Korea and the United States.

Overall organization of the research
A new AI training technology has emerged that allows artificial intelligence to fix data quality issues autonomously and selectively learn from only the most relevant information. Label errors, missing labels, and redundant data—long-standing challenges in real-world AI training—have often hindered performance and increased costs. This study introduces a method that enables AI to handle these issues automatically, without the need for prior data preprocessing.
The most notable feature is the automatic correction of label errors. During training, the model evaluates the reliability of each label in real time. If a label is deemed incorrect—such as an image of a cat mistakenly labeled as a dog—the model corrects it on its own. In experiments, the system achieved up to 95% correction accuracy even when 40% of the labels were incorrect, and improved classification accuracy by up to 9 percentage points on real-world datasets such as WebVision.

 

The technology also addresses the label shortage issue. By analyzing changes in time-series data, the model can automatically detect transition points, such as when a person switches from walking to running. This approach outperformed traditional distance-based methods, improving the accuracy by up to 12.7%, also proving especially effective for wearable healthcare sensor data.

 

The issue of data redundancy was tackled by enabling the AI to automatically select only the most informative samples for training. As a result, the model can achieve performance comparable to that when using the full dataset while reducing the training time by up to 90%. These functionalities are integrated into a unified framework that combines error correction and core-set selection for greater overall efficiency.

 

Beyond algorithmic improvements, this technology has been practically applied to solving societal problems. The researchers developed an AI model that forecasts inbound COVID-19 cases, earning a U.S. patent, and another that predicts the economic impact of infectious disease outbreaks on local businesses, which is patented in Korea.

 

This foundational technology directly supports the “Efficient Learning and AI Infrastructure Advancement” pillar of Korea’s National Strategic Technologies in AI. It is also expected to play a key role in the rapidly expanding AIOps (Artificial Intelligence for IT Operations) market, which is projected to grow from $27.24 billion in 2024 to approximately $79.91 billion by 2029, at a CAGR of 24.01%. This advancement marks a critical step toward making AI not only faster and smarter but also more practical and deployable across real-world domains.