Reinforcement Learning
- Online RL: agent takes actions in environment, gets rewards, observations
- Offline RL: agent learns from experiences of other agents
Offline Reinforcement Learning
- Behavior Cloning (BC) -> learn to mimic the other agents
- Q-Learning -> decision-tree of states and actions with transition values
Reinforcement Learning Upside Down: Don’t Predict Rewards - Just Map Them to Actionslink
- Decision Transformer articlevideo
- uses sequence modeling (GPT) for modeling states
- conditioned on the desired reward
- outputs action
RL as one big sequence modeling problem article Q-Transformer article Control-Oriented Learning for Dynamical Systems video
Imitation Learning
- learning an observation-action mapping from human demonstrations link
- 2 approaches
- state-aware imitation learning link
- adds a secondary objective to the learning task to bias the policy towards states where more training data is available
- meta-learning
- pre-train policies to adapt to a task -> one-shot learning link
- generative adversarial imitation learning link