발행물
컨퍼런스
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 2025
2025
,
GENIUS: A Generative Framework for Universal Multimodal Search
Improving Sound Source Localization with Joint Slot Attention on Image and Audio
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval
Video Summarization with Large Language Models