발행물
컨퍼런스
The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP)
,
Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
ACM Multimedia
Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding