기본 정보
연구 분야
프로젝트
발행물
구성원
article|
gold
·인용수 2
·2023
An Efficient Document Retrieval for Korean Open-Domain Question Answering Based on ColBERT
Byungha Kang, Yeong‐Hwa Kim, Youhyun Shin
IF 2.5Applied Sciences
초록

Open-domain question answering requires the task of retrieving documents with high relevance to the query from a large-scale corpus. Deep learning-based dense retrieval methods have become the primary approach for finding related documents. Although deep learning-based methods have improved search accuracy compared to traditional techniques, they simultaneously impose a considerable increase in computational burden. Consequently, research on efficient models and methods that optimize the trade-off between search accuracy and time to alleviate computational demands is required. In this paper, we propose a Korean document retrieval method utilizing ColBERT’s late interaction paradigm to efficiently calculate the relevance between questions and documents. For open-domain Korean question answering document retrieval, we construct a Korean dataset using various corpora from AI-Hub. We conduct experiments comparing the search accuracy and inference time among the traditional IR (information retrieval) model BM25, the dense retrieval approach utilizing BERT-based models for Korean, and our proposed method. The experimental results demonstrate that our approach achieves a higher accuracy than BM25 and requires less search time than the dense retrieval method employing KoBERT. Moreover, the most outstanding performance is observed when using KoSBERT, a pre-trained Korean language model that learned to position semantically similar sentences closely in vector space.

키워드
Computer scienceQuestion answeringVector space modelRelevance (law)Information retrievalDocument retrievalOpen domainInferenceTask (project management)Artificial intelligence
타입
article
IF / 인용수
2.5 / 2
게재 연도
2023