A Readability-Driven Curriculum Learning Method for Data-Efficient Small Language Model Pretraining | 김주애 교수 연구실 | 한국외국어대학교 본교(제1캠퍼스) Language & AI 융합학부

김주애 교수 연구실

서비스 플랜

연구실 검색

프로젝트 공고

정부 과제 추천

AI 기반 기업 서칭

홈

기본 정보

연구 분야

프로젝트

논문

구성원

article|

gold

·인용수 0

·2025

A Readability-Driven Curriculum Learning Method for Data-Efficient Small Language Model Pretraining

Suyun Kim, J.S. Park, Juae Kim

IF 2.2Mathematics

초록

Large language models demand substantial computational and data resources, motivating approaches that improve the training efficiency of small language models. While curriculum learning methods based on linguistic difficulty measures have been explored as a potential solution, prior approaches that rely on complex linguistic indices are often computationally expensive, difficult to interpret, or fail to yield consistent improvements. Moreover, existing methods rarely incorporate the cognitive and linguistic efficiency observed in human language acquisition. To address these gaps, we propose a readability-driven curriculum learning method based on the Flesch Reading Ease (FRE) score, which provides a simple, interpretable, and cognitively motivated measure of text difficulty. Across two dataset configurations and multiple curriculum granularities, our method yields consistent improvements over baseline models without curriculum learning, achieving substantial gains on BLiMP and MNLI. Reading behavior evaluations also reveal human-like sensitivity to textual difficulty. These findings demonstrate that a lightweight, interpretable curriculum design can enhance small language models under strict data constraints, offering a practical path toward more efficient training.

키워드

CurriculumReading (process)Language modelMeasure (data warehouse)Language acquisitionPath (computing)Baseline (sea)

타입

article

IF / 인용수

2.2 / 0

원문

https://doi.org/10.3390/math13203300

게재 연도

2025

프로젝트 공고 서비스 문의 자주 묻는 질문 이용약관 개인정보처리방침

주식회사 디써클

대표 장재우,이윤구서울특별시 강남구 역삼로 169, 명우빌딩 2층 (TIPS타운 S2)대표 전화 0507-1312-6417이메일 info@rndcircle.io사업자등록번호 458-87-03380호스팅제공자 구글 클라우드 플랫폼(GCP)