Register your product to gain access to bonus material or receive a coupon.
Video accessible from your Account page after purchase.
8 Hours of Video Instruction
Equips you with the knowledge and skills to assess LLM performance effectively
Evaluating Large Language Models (LLMs) introduces you to the process of evaluating LLMs, Multimodal AI, and AI-powered applications like agents and RAG. To fully utilize these powerful and often unwieldy AI tools and make sure they meet your real-world needs, they need to be assessed and evaluated. This video prepares you to evaluate and optimize LLMs so you can produce cutting edge AI applications.
Learn How To
Who Should Take This Course
AI practitioners, machine learning engineers, and data scientists who want to systematically evaluate LLMs, optimize their performance, and ensure they meet real-world application needs.
Course Requirements
Lesson Descriptions
Lesson 1: Foundations of LLM Evaluation
Lesson 1 explores why evaluation is a critical part of building and deploying LLMs. You learn about the differences between reference-free and reference-based evaluation, core metrics like accuracy and perplexity, and how these metrics can tie into real-world performance. By the end of the lesson, you'll have a solid grounding into what makes an evaluation framework and experiment effective.
Lesson 2: Evaluating Generative Tasks
Lesson 2 focuses on how to assess tasks like text generation, multiple-choice selection, and conversational use cases. You learn about key metrics like BERT score, cosine similarity, and perplexity, and how we can use them to interpret the context of our LLMs for your specific use case. The lesson also discusses challenges like hallucinations and explores tools for factual consistency checks.
Lesson 3: Evaluating Understanding Tasks
Understanding tasks, such as classification and information retrieval, require specialized evaluation strategies. Lesson 3 covers those concepts like calibration, accuracy, precision, and recall, all designed to evaluate these types of understanding tasks. It also discusses how embeddings and embedding similarities play a role in tasks like clustering and information retrieval. By the end of this session, you'll understand how to evaluate these models and tasks that are generally meant to understand complex and nuanced inputs.
Lesson 4: Using Benchmarks Effectively
Benchmarks are essential for comparing both models and model training methods, but they must be interrogated and used wisely. In Lesson 4, youll explore popular benchmarks like MMLU, MTEB, and TruthfulQA to learn what all of those acronyms are and to examine what exactly they test for and how they align with real-world tasks, this is the lesson for you. You'll learn how to interpret these benchmark scores and avoid common pitfalls like overfitting to benchmarks or relying on outdated datasets.
Lesson 5: Probing LLMs for a World Model
LLMs can encode vast amounts of knowledge, but how can we evaluate what they truly know without relying on prompting? Lesson 5 explores the probing technique that tests a model's internal representation, such as factual knowledge, reasoning abilities, and biases. You'll gain hands-on experience with probing techniques and learn how to use them to uncover hidden strengths and weaknesses in your models.
Lesson 6: Evaluating LLM Fine-Tuning
Fine-tuning enables a model to specialize, but it's essential to evaluate how well that process aligns the model to the specific task at hand. Lesson 6 covers metrics for fine-tuning success, including loss functions, memory usage, and general speed. It discusses tradeoffs like overfitting and interpretability, ensuring that we can balance performance with reliability.
Lesson 7: Case Studies
Lesson 7 applies everything you've learned so far to five real-world scenarios. Through these detailed case studies, you'll see how evaluation frameworks are used in production settings, from improving a chatbot's conversational ability to optimizing AI agents and retrieval-augmented generation (RAG) systems. It also explores time series regression problems and drift-related AI issues, leaving you with practical insights that can be applied throughout your projects.
Lesson 8: Summary of Evaluation and Looking Ahead
In the final lesson, Sinan recaps the key points and metrics covered throughout the lessons. From the foundational metrics to the advanced evaluation techniques, this lesson looks back on the dozens of metrics covered in an easy to evaluate table. He also discusses emerging trends in LLM evaluations such as multimodal benchmarks and real-time monitoring and reflects on the ethical and fairness considerations of deploying powerful AI systems.
About Pearson Video Training
Pearson publishes expert-led video tutorials covering a wide selection of technology topics designed to teach you the skills you need to succeed. These professional and personal technology videos feature world-leading author instructors published by your trusted technology brands: Addison-Wesley, Cisco Press, Pearson IT Certification, Sams, and Que. Topics include IT Certification, Network Security, Cisco Technology, Programming, Web Development, Mobile Development, and more. Learn more about Pearson Video training at http://www.informit.com/video.
Video Lessons are available for download for offline viewing within the streaming format. Look for the green arrow in each lesson.
Lesson 1: Foundations of LLM Evaluation
Lesson 2: Evaluating Generative Tasks
Lesson 3: Evaluating Understanding Tasks
Lesson 4: Using Benchmarks Effectively
Lesson 5: Probing LLMs for a World Model
Lesson 6: Evaluating LLM Fine-Tuning
Lesson 7: Case Studies
Lesson 8: Summary of Evaluation and Looking Ahead