Research Webzine of the KAIST College of Engineering since 2014
Fall 2025 Vol. 25Professor Hwanjun Song’s research team has developed an innovative framework that enables AI to learn autonomously and improve the quality of responses using feedback generated by large language models, without human involvement. This approach moves beyond the costly and inefficient traditional reliance on human feedback, aiming to deliver more accurate and trustworthy AI responses

Large language models (LLMs) are designed to generate answers that reflect human preferences. Preference optimization methods such as PPO and DPO help by training models to select responses that people prefer, reducing mistakes such as hallucinations and irrelevant content. These methods work by showing humans pairs of model-generated responses and asking them to choose the better one. The model then learns to favor responses that consistently receive higher human ratings.
However, existing human feedback methods face major three major challenges. First, direct human evaluations of complex tasks require a lot of time and effort, complicating efforts to scale them. Second, while A/B comparisons help reduce effort, it remains difficult to ensure fair and consistent feedback across different raters. Finally, crowdsourcing platforms such as Mechanical Turk cannot provide high-quality feedback for expert tasks.

To break this barrier, the research team developed a novel framework that allows AI to learn from detailed, fine-grained feedback generated entirely by advanced LLMs, completely eliminating the need for human involvement in the feedback loop. This innovative approach enables AI to refine its responses continuously based on large-scale, multi-dimensional feedback that would be impractical for humans to produce. As a result, AI systems generate responses that are more accurate, focused, and aligned with human preferences.

The framework lets advanced LLMs automatically check and rate AI-generated responses based on what people usually prefer. Instead of asking humans to compare answers, the LLM itself gives detailed feedback on, for example, whether the response is correct or whether it covers the important points without adding extra, unnecessary information. This detailed feedback helps to train the AI so it learns to provide better responses that match what people want, all without the need for human help during training.
By applying this framework to text summarization, the team developed SummLlama, a compact model that outperformed Meta’s much larger Llama3-70B-instruct. Despite being about ten times smaller, SummLlama3 generated summaries that human judges preferred for their accuracy, completeness, and clarity. This demonstrates how AI-generated feedback can replace human feedback at scale, enabling the creation of smaller, faster models that still meet or exceed human expectations for complex language tasks.
This research was published under the title Learning to Summarize from LLM-generated Feedback at the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL, 2025).
A New solution enabling soft growing robots to perform a variety of tasks in confined spaces
Read moreAI-Designed carbon nanolattice: Feather-light, steel-strong
Read moreDevelopment of a compact high-resolution spectrometer using a double-layer disordered metasurface
Read moreWearable hyperspectral photoplethysmography for the continuous monitoring of exercise-induced hypertension
Read moreSmarter AI through AI-generated feedback
Read more