An Efficient Document Retrieval for Korean Open-Domain Question Answering Based on ColBERT | 신유현 교수 연구실 | 고려대학교 언어학과

신유현 교수 연구실

서비스 플랜

연구실 검색

프로젝트 공고

정부 과제 추천

AI 기반 기업 서칭

홈

기본 정보

연구 분야

프로젝트

발행물

구성원

article|

gold

·인용수 2

·2023

An Efficient Document Retrieval for Korean Open-Domain Question Answering Based on ColBERT

Byungha Kang, Yeong‐Hwa Kim, Youhyun Shin

IF 2.5Applied Sciences

초록

Open-domain question answering requires the task of retrieving documents with high relevance to the query from a large-scale corpus. Deep learning-based dense retrieval methods have become the primary approach for finding related documents. Although deep learning-based methods have improved search accuracy compared to traditional techniques, they simultaneously impose a considerable increase in computational burden. Consequently, research on efficient models and methods that optimize the trade-off between search accuracy and time to alleviate computational demands is required. In this paper, we propose a Korean document retrieval method utilizing ColBERT’s late interaction paradigm to efficiently calculate the relevance between questions and documents. For open-domain Korean question answering document retrieval, we construct a Korean dataset using various corpora from AI-Hub. We conduct experiments comparing the search accuracy and inference time among the traditional IR (information retrieval) model BM25, the dense retrieval approach utilizing BERT-based models for Korean, and our proposed method. The experimental results demonstrate that our approach achieves a higher accuracy than BM25 and requires less search time than the dense retrieval method employing KoBERT. Moreover, the most outstanding performance is observed when using KoSBERT, a pre-trained Korean language model that learned to position semantically similar sentences closely in vector space.

키워드

Computer scienceQuestion answeringVector space modelRelevance (law)Information retrievalDocument retrievalOpen domainInferenceTask (project management)Artificial intelligence

타입

article

IF / 인용수

2.5 / 2

원문

https://doi.org/10.3390/app132413177

게재 연도

2023