这是CS285第一周的作业,主要介绍了模仿学习(Imitation Learning),通过MSE和Flow Matching两种方法实现对push-T物体推入目标区域的任务。
Preface
强化学习是具身智能、智能决策系统的基础,CS285(Reinforcement Learning)是斯坦福大学CS234的进阶课程,主要介绍了强化学习中的各种算法,包括Q-learning、Policy Gradient、Actor-Critic、DQN、DDPG、PPO等。本课程由斯坦福大学的Andrew Ng教授主讲,课程内容包括理论推导、算法实现、实验分析等。
作业一主要介绍了模仿学习(Imitation Learning),通过MSE和Flow Matching两种方法实现对push-T物体推入目标区域的任务。
仓库在:https://github.com/berkeleydeeprlcourse/homework_spring2026/tree/main/hw1↗
Wandb配置
在conda环境中安装wandb:
pip install wandbpip install wandb[video]RL base
Some basic concept.
define the Markov Decision Process
Actions
Markov Decision Process
Goal
Chain Rule of Prob.
Ctrl + C to copy the text.
<Kbd>Ctrl</Kbd> + <Kbd>C</Kbd> to copy the text.MSE基础实现
先填充model.py,注意action/state的维度,以及chunk_size的设置,chunk_size是action的长度,即每个chunk包含多少个action。
Ctrl + shift to next.
def __init__( self, state_dim: int, action_dim: int, chunk_size: int, hidden_dims: tuple[int, ...] = (128, 128), ) -> None: super().__init__(state_dim, action_dim, chunk_size) output_dim = chunk_size * action_dim layers = [] prev_dim = state_dim
for hidden_dim in hidden_dims: layers.append(nn.Linear(prev_dim, hidden_dim)) layers.append(nn.ReLU()) prev_dim = hidden_dim
layers.append(nn.Linear(prev_dim, output_dim)) self.mlp = nn.Sequential(*layers)Flow实现
Thanks for reading!
