Neural Information Processing Systems Workshop (NeurIPSW) on AI-Driven Speech, Music, and Sound Generation
2024.12
2024.12
,
Efficient generative multimodal integration (EGMI): enabling audio generation from text-image pairs through alignment with large language models