NVIDIA DLI Building AI Agents with Multimodal Models
About this Course
Learn how to build neural network agents that reason across multiple data types using advanced fusion techniques, OCR, and NVIDIA AI Blueprints for real-world applications like robotics and video search and summarization.
Learning Objectives
In this course, you will learn about:
- Different data types and how to make them neural network ready
- Model fusion, and the differences between early, late, and intermediate fusion
- PDF extraction using OCR
- The difference between modality and agent orchestration
- Customization of NVIDIA AI Blueprints with Video Search and Summarization (VSS)
Topics Covered
- Begin with a robotics use case to show how different datatypes impact an effective neural-networks architecture.
- Apply mathematical concepts from robotics to Large Language Models (LLMs) to modify them for non-language data input.
- End with orchestration of multiple models to answer user queries.
Course Outline
Early and Late Fusion (1 hr)
- Use camera and LiDAR data to predict object positions.
- Convert various datatypes to make them neural network ready.
Intermediate Fusion (1 hr)
- Explore the theory behind effective multimodal model architecture.
- Train a Contrastive Pretraining model.
- Create a vector database.
Cross-modal Projection (2 hr)
- Convert a Language model into a Vision Language Model (VLM).
- Process PDFs with Optical Character Recognition (OCR) tools.
Model Orchestration (2 hr)
- Analyze video using Cosmos Nemotron.
- Use VSS to answer user queries about video content.
- Orchestrate with NVIDIA AI Blueprints.
Assessment (1 hr)
- Convert a pre-trained model to input a different datatype using projection.
Course includes
- Hands-on lab exercises
- Industry-relevant projects
- Certificate of competence (upon passing the graded assessments)
- Access to NVIDIA DLI pre-configured computing environments with GPUs
Get Started
Ready to advance your AI skills? Contact us via info@kineto.ai to learn more about course availability, scheduling, and enrollment options.