Authors
Resource type
Use type
Tools
Tags
Project name
Lesson Overview
In this lesson, students do not use LLMs. Instead, then engage in playful, hands-on/unplugged activities that focus on the crucial stage of fine-tuning LLMs, highlighting the human influence on their behavior and output. It consists of three interconnected activities:
- Activity #1: Supervised Fine-Tuning (SFT) This activity introduces the concept of fine-tuning through examples. Students create their own "tuning data" by writing desired interactions with an LLM, understanding how specific examples shape the model's behavior. The activity also addresses the introduction of personal bias and the trade-offs involved in creating SFT datasets.
- Activity #2: Human Feedback (HF) This activity simulates how LLMs learn from human feedback. Students work in groups to create a "Reward Policy" by defining rules for desired LLM behavior. Through a game-based approach, students experience how feedback shapes the model's responses and how preference bias can influence the policy.
- Activity #3: Reinforcement Learning from Human Feedback (RLHF) This activity delves deeper into the RLHF process, where a "reward model" trains a language model through feedback. Students engage in a partner-based guessing game, role-playing the reward model and the language model. This activity emphasizes how human ratings create the reward model and how bias can emerge in the training process. The lesson also involves comparing outputs from different LLMs to demonstrate how fine-tuning impacts their responses.
Total Lesson Time: Activity #1 is 45 minutes. Activities #2 and #3 are 60 minutes each.
Learning Objectives:
- LLMs can be fine-tuned on specific datasets to improve performance for particular tasks, but this can introduce personal bias (Activity #1)
- Tuning is done by human raters, who rate model output based on criteria, which can introduce bias (Activity #2)
- Human ratings are used to create a reward model, which trains the LLM to produce human-like language (Activity #3)
Vocabulary Introduced: fine-tuned, personal bias, human raters, reward model
Pacing for Activity #1
- Opening (5 min)
- Mini-lesson: tuning (5 min)
- Writing activity (10 min)
- Gallery Walk (10 min)
- Connection & Discussion (10 min)
- Closing (5 in)
Pacing for Activity #2
- Opening (10 min)
- Mini-lesson: feedback policies (10 min)
- Reward policy activity (10 min)
- Gallery Walk (10 min)
- Connection & Discussion (10 min)
- Closing (10 min)
Pacing for Activity #3
- Opening (10 min)
- Mini-lesson: Reinforcement learning with human feedback (RLHF) (10 min)
- RLHF activity (20 min)
- Reflection & Discussion (10 min)
- Closing (5 min)
Planning Guide
Preparation Needed: 15-20 minutes for each activity
Prep Needed for Teaching Activity #1 In-Person
- Practice interactions with distilgpt2/small for class demonstration.
- Prepare writing materials for the class. These could be for either typing or hand-written work depending on your preferences for your class.
- Prepare the room for a gallery walk so that students can view each other's writing.
Prep Needed for Teaching Activity #2 In-Person
- Reward Policy worksheet and scorecard (or grid paper to make one's own) available to partners to complete in pairs.
Prep Needed for Teaching Activity #3 In-Person
- Students will need access to their writing from Activity #1 and their Reward Policy from Activity #2.
- Students should be able to work on Activity #3 with a partner (the same partner as Activity #2 or a different one depending on your preferences for your class).
Log in or register to view attachments and related links, and/or join the discussion. If you are already logged in, scroll to the bottom of this page for the links.
Activity Usage
Copyright held by MIT STEP Lab
License: CC-BY-NC under Creative Commons
These materials are licensed as CC-BY-NC 4.0 International under creative commons. (For more information visit https://creativecommons.org/licenses/by-nc/4.0/). This license allows you to remix, tweak, and build upon these materials non-commercially as long as you include acknowledgement to the creators. Derivative works should include acknowledgement but do not have to be licensed as CC-BY-NC. People interested in using this work for for-profit commercial purposes should reach out to Irene Lee at [email protected] for information as to how to proceed. Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
Attribution
This unit was created by Katherine (Kate) Moore of MIT for the Everyday AI PD project, which created the Developing AI Literacy (DAILy) 2.0 curriculum.