Register your product to gain access to bonus material or receive a coupon.
Video accessible from your Account page after purchase.
6+ Hours of Video Instruction
Equips you with the knowledge and skills needed to implement multimodal AI systems
Multimodal AI Essentials: Merging Text, Image, and Audio for Next-Generation AI Applications shows you how combining modalities like text, audio, video, and images can enable AI systems to achieve remarkable capabilities. You will gain hands-on experience building visual question and answering models, generating personalized images with diffusion, designing end to end multimodal applications, and even fine-tuning multimodal models for specific tasks. This video gives you the tools, knowledge, and confidence to design and deploy your own state-of-the-art multimodal AI systems.
Learn How To
Who Should Take This Course
Course Requirements
Lesson Descriptions
Lesson 1: Introduction to Multimodal AI
Lesson 1 lays the groundwork for the course by introducing the core concepts of multimodal AI and its applications. It explores the significance of combining modalities like text, images, and audio to unlock a new frontier in AI development. By the end of this lesson, you will understand the transformative potential of multimodal AI systems and their impacts across industries.
Lesson 2: Building Visual Question Answering (VQA) Models
In Lesson 2 you dive into the intricacies of constructing visual question and answering (VQA) systems with Sinan, models capable of answering questions about images. Through examples and architectural walkthroughs, you learn how to embed and fuse these modalities together effectively, gaining real insights into the applications of VQA.
Lesson 3: Exploring Diffusion Models
Lesson 3 introduces diffusion, a groundbreaking approach in image generation. Unlike traditional methods, diffusion models iteratively refine noisy images to create coherent outputs. The lesson explores the theory behind both forward corruption and backwards diffusion. You also implement your own fine-tuned version of diffusion using a technique known as DreamBooth.
Lesson 4: Developing Multimodal AI Systems
Lesson 4 focuses on the practical aspects of designing and implementing multimodal AI applications. From fine-tuning text-to-speech models to building your own visual agent, the lesson demonstrates how to create cohesive systems that handle diverse input and output modalities.
Lesson 5: Evaluating and Testing Multimodal AI Systems
Lesson 5 covers evaluation metrics, benchmarks, and the ethical considerations involved in testing multimodal AI systems. It also discusses bias mitigation and responsible AI practices, covering topics like the LLMs as multimodal judges and the proliferation of Deepfakes.
Lesson 6: Expanding and Applying Multimodal AI
Lesson 6 explores advanced techniques and future trends in multimodal AI. You will see how we can extend existing AI systems with cutting edge methods, integrating novel data types. The lesson also anticipates the direction of this rapidly evolving field and its future applications, including things such as computer use for generalized AI agentic behavior.
About Pearson Video Training
Pearson publishes expert-led video tutorials covering a wide selection of technology topics designed to teach you the skills you need to succeed. These professional and personal technology videos feature world-leading author instructors published by your trusted technology brands: Addison-Wesley, Cisco Press, Pearson IT Certification, Sams, and Que. Topics include IT Certification, Network Security, Cisco Technology, Programming, Web Development, Mobile Development, and more. Learn more about Pearson Video training at http://www.informit.com/video.
Lesson 1: Introduction to Multimodal AI
Lesson 2: Building Visual Question Answering (VQA) Models
Lesson 3: Exploring Diffusion Models
Lesson 4: Developing Multimodal AI Systems
Lesson 5: Evaluating and Testing Multimodal AI Systems
Lesson 6: Expanding and Applying Multimodal AI