발행물
컨퍼런스
WINTER SEMINAR
2001.09
,
HellaSwag: Can a Machine Really Finish Your Sentence?
Measuring Massive Multitask Language Understanding
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge (ARC)
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Clues Before Answers: Generation-Enhanced Multiple-Choice QA