AI
Teach a Robot to Learn
Guided by Pieter Abbeel
Most robots have to be told exactly what to do. You're going to teach one to figure it out by practicing — the way you learn to ride a bike.
Watch
See it happen in the real world.
Reinforcement learning is how a robot learns by trying things and getting a score. Do something good, get points. Do something bad, lose points. Do this a million times and the robot starts choosing the moves that get the most points. Pieter Abbeel's lab once taught a robot arm to fold a towel this way, which sounds simple until you realize a towel is one of the hardest objects in robotics — it changes shape every time you touch it.
Watch this
Think
A question worth sitting with.
If your robot got the same score every time, no matter what it did, would it ever get better?
Build
Make something with your hands.
Use Scratch or a paper game. Make a robot that moves in four directions. Give it +10 points for reaching a goal square and -1 for every step it takes. Run it 10 times and track the score.
Step-by-step
Draw a 5×5 grid on paper. Pick one square as the GOAL. Place a coin in the opposite corner — that's the robot.
Set up the rule: every step costs −1 point. Reaching the goal earns +10 points.
Play 10 rounds. Each round, move the coin to the goal however you want. Track total points per round.
Watch what your strategy does. The first few rounds you wander. By round 5 you're probably taking a shorter path. That improvement is exactly what reinforcement learning looks like.
Now change the rule: −2 points per step. Play 10 more rounds. Did your strategy change?
A tiny reinforcement learner on a 4-square track
Pythonimport random
# States: 0 = start, 3 = goal
# Actions: 'right' moves +1, 'left' moves -1
Q = {0: {'right': 0, 'left': 0}, 1: {'right': 0, 'left': 0}, 2: {'right': 0, 'left': 0}}
def step(state, action):
nxt = state + (1 if action == 'right' else -1)
nxt = max(0, min(3, nxt))
reward = 10 if nxt == 3 else -1
return nxt, reward
for episode in range(50):
state = 0
while state != 3:
action = random.choice(['right', 'left'])
nxt, reward = step(state, action)
Q[state][action] += 0.1 * (reward - Q[state][action])
state = nxt
print(Q) # Notice 'right' wins at every stateToolkit
- Scratch
- Paper
Play
Test it. See what it does.
Be the robot. Sit in a chair while someone hides a small object in the room. You can only ask 'warmer' or 'colder' as you move around. Count your moves to find it.
Challenge
Push it a little further.
Change the scoring so the robot is rewarded for being fast, not just for finishing. Watch how its strategy changes.
Reflect
Notice what your robot taught you.
What did your robot learn that you didn't expect?
Also ask yourself
What surprised you?
Reward
Mission outro
+0XP
“Next week your robot will learn to see.”
Skills advanced: robot learning, machine learning
Badge
AI Apprentice