Datasets
- MNIST: Handwritten Digits (unimodal)
- Fashion-MNIST: Zalando’s Article Images (unimodal)
- CIFAR-10/100: Tiny Natural Images Dataset (unimodal)
- CANDOR: Conversational Audio-Visual Dataset for Human Interaction (multimodal)
- Coswara: Breathing, Cough, and Speech Sounds for COVID-19 Diagnosis (unimodal - audio)
- ImageNet: Large-Scale Visual Recognition Dataset (unimodal - image)
- WIT: Wikipedia-based Image-Text Dataset (multimodal)
- AudioSet: Audio Events in YouTube Videos (unimodal - audio)
- VoxCeleb: Large-Scale Speaker Recognition Dataset (multimodal - audio & video)