4.1 Pre-training

Lesson Overview

In this lesson, students do not use LLMs. Instead, then engage in playful, hands-on/unplugged activities that aim to demystify how LLMs learn language by exploring concepts like tokenization, vectors, and attention mechanisms. It consists of three activities, each building upon the previous one.

  • Activity #1: Preference Vectors introduces the concept of representing data as vectors. Students create "vector bracelets" to visualize how their personal preferences (e.g., "I like trying new things.") and words (e.g., "screen" and "cast") can be transformed into numerical representations, allowing them to understand how LLMs calculate similarity. This activity can be done unplugged, but it also has an optional google sheets component.
  • Activity #2: MadLLMs simulates the next-word prediction process of LLMs. Students work in pairs to predict the likelihood of word pairings, experiencing how LLMs use context windows and training data to make predictions. This activity is teacher-driven and involves a slide deck with target scores.
  • Activity #3: MadLLMs with Attention builds on the previous activities by introducing the attention mechanism. Students use their knowledge of word vectors and similarity scores to understand how LLMs weigh the importance of different words in a sequence. This activity combines elements of the first two, allowing students to experience how attention enhances LLM predictions.

 

Total Lesson Time: Activity #1 and #2 are 45 minutes each. Activity #3 is 60 minutes.

Learning Objectives: 

  • LLMs can predict likely meanings based on word and phrase context (word vectors) (Activity #1)
  • LLMs process text by breaking it down into smaller units called tokens (Activity #2)
  • Attention mechanisms allow LLMs to weigh the importance of different parts of the input or input sequence. (Activity #3)

Vocabulary Introduced:  word vectors, tokens, attention, input sequence

Pacing for Activity #1

  • Opening (5 min)
  • Mini-lesson: word vectors (5 min)
  • Preference vectors activity (10 min)
  • Gallery Walk (10 min)
  • Connection & Discussion (10 min)
  • Closing (5 in)

Pacing for Activity #2

  • Opening (5 min)
  • Mini-lesson: context windows (5 min)
  • Probability pairings activity (20 min)
  • Reflection & Discussion (10 min)
  • Closing (5 min)

Pacing for Activity #3

  • Opening (5 min)
  • Mini-lesson: attention (5 min)
  • Clustering by similarity scores activity (30 min)
  • Reflection & Discussion (10 min)
  • Closing (5 min)

 

Planning Guide

Preparation Needed: 15-20 minutes for each activity

Prep Needed for Teaching Activity #1 In-Person

  • If you decide to use the Preference Vector Google Sheet, make the link available to students.
  • Create strips of paper for each student with 5 boxes on each, which can be worn as a bracelet.
  • Gather coloring materials, such that students can color in each of the boxes on their bracelet.
  • Optional: Gather beading materials to make word vectors into bracelets (1 bead color represents 1 of the 5 preferences).

Prep Needed for Teaching Activity #2 In-Person

  • Print word cards to make a deck of words that pair together. Deck 1 = context window words (first word in pairing); Deck 2 = label (second word in pairing).
  • Post a probability spectrum in the classroom large enough for student pairs to stand along: 0% - 100%

Prep Needed for Teaching Activity #3 In-Person

  • Prep is the same as Activity #2
  • Students will also need space to spread out across the classroom to stand in distinct clusters according to their word similarity scores. 

 

Log in or register to view attachments and related links, and/or join the discussion.  If you are already logged in, scroll to the bottom of this page for the links.

 

Activity Usage

Copyright held by MIT STEP Lab 

License: CC-BY-NC under Creative Commons

These materials are licensed as CC-BY-NC 4.0 International under creative commons. (For more information visit https://creativecommons.org/licenses/by-nc/4.0/). This  license allows you to  remix, tweak, and build upon these materials non-commercially as long as you include acknowledgement to the creators. Derivative works should include acknowledgement but do not have to be licensed as CC-BY-NC. People interested in using this work for for-profit commercial purposes should reach out to Irene Lee at [email protected] for information as to how to proceed. Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

 

Attribution

This unit was created by Katherine (Kate) Moore of MIT for the Everyday AI PD project, which created the Developing AI Literacy (DAILy) 2.0 curriculum.

Log in or register to view attachments and related links, and/or join the discussion