Teach a Robot to Learn

Guided by Pieter Abbeel

Most robots have to be told exactly what to do. You're going to teach one to figure it out by practicing — the way you learn to ride a bike.

25 min+130 XP

Watch

See it happen in the real world.

Reinforcement learning is how a robot learns by trying things and getting a score. Do something good, get points. Do something bad, lose points. Do this a million times and the robot starts choosing the moves that get the most points. Pieter Abbeel's lab once taught a robot arm to fold a towel this way, which sounds simple until you realize a towel is one of the hardest objects in robotics — it changes shape every time you touch it.

Watch this

Think

A question worth sitting with.

If your robot got the same score every time, no matter what it did, would it ever get better?

Build

Make something with your hands.

Use Scratch or a paper game. Make a robot that moves in four directions. Give it +10 points for reaching a goal square and -1 for every step it takes. Run it 10 times and track the score.

Step-by-step

Draw a 5×5 grid on paper. Pick one square as the GOAL. Place a coin in the opposite corner — that's the robot.
Set up the rule: every step costs −1 point. Reaching the goal earns +10 points.
Play 10 rounds. Each round, move the coin to the goal however you want. Track total points per round.
Watch what your strategy does. The first few rounds you wander. By round 5 you're probably taking a shorter path. That improvement is exactly what reinforcement learning looks like.
Now change the rule: −2 points per step. Play 10 more rounds. Did your strategy change?

A tiny reinforcement learner on a 4-square track

Python

import random

# States: 0 = start, 3 = goal
# Actions: 'right' moves +1, 'left' moves -1
Q = {0: {'right': 0, 'left': 0}, 1: {'right': 0, 'left': 0}, 2: {'right': 0, 'left': 0}}

def step(state, action):
    nxt = state + (1 if action == 'right' else -1)
    nxt = max(0, min(3, nxt))
    reward = 10 if nxt == 3 else -1
    return nxt, reward

for episode in range(50):
    state = 0
    while state != 3:
        action = random.choice(['right', 'left'])
        nxt, reward = step(state, action)
        Q[state][action] += 0.1 * (reward - Q[state][action])
        state = nxt

print(Q)  # Notice 'right' wins at every state

Toolkit

Scratch
Paper

Play

Test it. See what it does.

Be the robot. Sit in a chair while someone hides a small object in the room. You can only ask 'warmer' or 'colder' as you move around. Count your moves to find it.

Challenge

Push it a little further.

Change the scoring so the robot is rewarded for being fast, not just for finishing. Watch how its strategy changes.

Reflect

Notice what your robot taught you.

What did your robot learn that you didn't expect?

Also ask yourself

What surprised you?

Reward

Mission outro

+0XP

“Next week your robot will learn to see.”

Skills advanced: robot learning, machine learning

Badge

AI Apprentice

Explore another mission →