Math Principles of Reinforcement Learning Notes—Windylab

Math Principles of Reinforcement Learning Notes—Windylab

Wed Apr 01 2026

23 words · 1 minutes

RL Lecture RL Lecture EAI

第二课：Bellman Equation

Return: Evaluate policies

the sum of the rewards obtained along a trajectory.

return_1=\frac{\gamma}{1-\gamma} \\ return_2 \\ return_3=-0.5+\frac{\gamma}{1-\gamma}

How to calculate?

Definition
Bootstrapping

v=r+\gamma P v

Thanks for reading!

Math Principles of Reinforcement Learning Notes—Windylab

Wed Apr 01 2026

23 words · 1 minutes

RL Lecture RL Lecture EAI

© Napucheng | CC BY-NC-SA 4.0