Reinforcement Learning
Learning control policies from reward.
Why it matters in robotics
RL is the go-to framework when a robot must learn control policies that are hard to hand-engineer โ locomotion, dexterous manipulation, and whole-body control โ so interviewers probe whether you understand the MDP formulation, the exploration/exploitation tradeoff, and why sample efficiency and reward design matter. Expect questions contrasting model-free policy-gradient methods (PPO, SAC) with value-based ones, and on the sim-to-real gap that dominates real-robot deployment. Being able to reason about reward shaping, on- vs off-policy data, and domain randomization signals you can actually ship learned controllers, not just train them in simulation.
Application focus
The same topic, tailored to the robot you're building. Your choice is remembered across the roadmap and every topic.
At a glance
The RL interaction loop: the agent takes an action, the environment returns the next state and a reward, and the agent updates its policy to maximize cumulative reward.
What to study
- โMDP formulation: states, actions, rewards, returns, discounting, and the Bellman equations / value functions
- โCore algorithm families: value-based (Q-learning, DQN) vs policy-gradient / actor-critic (REINFORCE, PPO, SAC) and their tradeoffs
- โExploration vs exploitation, on-policy vs off-policy learning, and sample efficiency
- โRobotics specifics: reward shaping, continuous control, and the sim-to-real gap (domain randomization, system identification)
Study by time budget
Pick the path that fits the time you have before your interview.