Just like how humans have multiple senses to perceive the world around them, computers have a variety of sensors to help perceive the human world. In the health industry, computed tomography (CT) scans provide a 3D representation used to detect potentially dangerous abnormalities. In the robotics industry, lidars are used to help robots see depth and navigate the complex topology around them. In this course, learners will develop neural network based multimodal models that can understand many different data types by exploring different fusion techniques.
Wed, 04/29/2026
9:00 AM – 5:00 PM
Duration: 8 hours
Subject: Deep Learning
Language: English
Course Prerequisites:
Tools, libraries, frameworks used: PyTorch, CLIP