MFSDSAI, IITG
DA323 Multimodal Data Processing and Learning 2.0
Jan-May 2025

Main Navigation

  • Home
  • Lectures
  • Assignments
  • Project
  • Materials
  • Datasets

Datasets

  • MNIST: Handwritten Digits (unimodal)
  • Fashion-MNIST: Zalando’s Article Images (unimodal)
  • CIFAR-10/100: Tiny Natural Images Dataset (unimodal)
  • CANDOR: Conversational Audio-Visual Dataset for Human Interaction (multimodal)
  • Coswara: Breathing, Cough, and Speech Sounds for COVID-19 Diagnosis (unimodal - audio)
  • ImageNet: Large-Scale Visual Recognition Dataset (unimodal - image)
  • WIT: Wikipedia-based Image-Text Dataset (multimodal)
  • AudioSet: Audio Events in YouTube Videos (unimodal - audio)
  • VoxCeleb: Large-Scale Speaker Recognition Dataset (multimodal - audio & video)

Mehta Family School of Data Science and Artificial Intelligence
IIT Guwahati, India

  • iitg.ac.in/dsai/