Math Principles of Reinforcement Learning Notes—Windylab

Math Principles of Reinforcement Learning Notes—Windylab

Wed Apr 01 2026
23 words · 1 minutes

第二课:Bellman Equation

Return: Evaluate policies

the sum of the rewards obtained along a trajectory.

return1=γ1γreturn2return3=0.5+γ1γreturn_1=\frac{\gamma}{1-\gamma} \\ return_2 \\ return_3=-0.5+\frac{\gamma}{1-\gamma}

How to calculate?

  • Definition
  • Bootstrapping
v=r+γPvv=r+\gamma P v
Thanks for reading!

Math Principles of Reinforcement Learning Notes—Windylab

Wed Apr 01 2026
23 words · 1 minutes