All topics | KAIST Breakthroughs

KAIST
BREAKTHROUGHS

Research Webzine of the KAIST College of Engineering since 2014

Spring 2025 Vol. 24

Computing

Extremely small AI system that answers factual questions about the world

July 27, 2023 hit 221

Extremely small AI system that answers factual questions about the world

The retrieve-and-read approach is widely adopted for open-domain question answering due to its accuracy, interpretability, and flexibility, while suffering from a large memory footprint. MS student Sohee Yang and Professor Minjoon Seo explored approaches to make the system footprint extremely small.

Article | Fall 2021

MS student Sohee Yang and Professor Minjoon Seo from the Graduate School of AI explored how to design an extremely small retrieve-and-read open-domain question answering (ODQA). The research was published under the title “Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering” at the 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2021), a top conference in the field of Natural Language Processing (NLP).

Open-domain question answering (ODQA) is an NLP task of finding answers to generic factoid questions. A widely adopted approach to the task is the retrieve-and-read method, which solves ODQA by first retrieving documents relevant to the question from a large knowledge source and then reading the retrieved documents to find out the answer. Retrieve-and-read systems have the advantage that their predictions are interpretable because the passages on which the prediction is grounded can be explicitly analyzed. Also, it is easy to update the knowledge source flexibly. However, retrieve-and-read ODQA systems often have large storage footprints, mainly due to the large size of the knowledge source.

Figure 1. Retrieve-and-read open-domain question answering system

On the other hand, building an interpretable and flexible ODQA system and making the system size small are both important in real-world scenarios. The system needs to be interpretable for easier maintenance, be able to flexibly update the knowledge to quickly adapt to changes of the world, and be able to be deployed in a constrained serving environment such as edge devices. To enjoy the benefits of the retrieve-and-read method while keeping its system size small, Yang and Seo explored strategies to create a minimal retrieve-and-read ODQA system and the trade-off between the storage budget and system accuracy. The strategies are as follows: training a filtering model that excludes uninformative documents to reduce the corpus size, sharing the encoder parameters of the retriever and the reader using knowledge distillation and iterative finetuning, and applying engineering techniques including quantization and half-precision inference.

Figure 2. Schematic diagram of the extremely small (484Mb) retrieve-and-read open-domain question answering system developed by Yang and Seo

Yang and Seo applied the described strategies to a recent retrieve-and-read system, DPR (Dense Passage Retriever; Karpukhin et al., 2020), and reduced the docker container size of the system by 160x, from 77.5Gb to 484Mb. The resulting system also preserved 93.3% of the accuracy on the development dataset given in the NeurIPS 2020 EfficientQA Competition hosted by Google Research. In the competition, the system ranked second place in automatic evaluation with 32.06% accuracy on the “Systems Under 500Mb” track, which was 1.38% behind the top-performing system of UCL and Facebook AI. Moreover, its human evaluation result ranked first place with 42.23% accuracy, 2.83% higher than the other system. The human evaluation score is manually calculated by people considering the phrasing and multiple valid answers, while automatic evaluation only checks if the system prediction is among the five reference answers.

Figure 3. Web demo of the extremely small (484Mb) retrieve-and-read open-domain question answering system developed by Yang and Seo

Yang and Seo believe that the research will be helpful for designing future retrieve-and-read systems under a storage-constrained serving environment and that the developed system would serve as a sound baseline. More information can be found in the following links.

Paper Link: https://www.aclweb.org/anthology/2021.naacl-main.468/

Inference Code and System Demo: https://github.com/clovaai/minimal-rnr-qa

NeurIPS 2020 EfficientQA Competition: https://efficientqa.github.io/

LIST