XXooptRobotics

Imitation Learning

Learning from demonstrations โ€” BC, DAgger, diffusion policies.

hardLearning

Why it matters in robotics

Imitation learning is the dominant paradigm behind today's real-robot manipulation and vision-language-action (VLA) policies, so it shows up constantly in robot-learning and applied-ML interviews. Interviewers probe whether you understand why naive behavior cloning fails (covariate shift and compounding error) and can explain the standard fixes: DAgger and interactive imitation, action chunking with temporal ensembling, and diffusion/flow policies for multimodal action distributions. Expect questions connecting these ideas to systems you should recognize by name (ACT/ALOHA, Diffusion Policy, OpenVLA, pi-0) and to practical concerns like demonstration collection via teleoperation. A common trap is the mode-averaging failure of MSE regression on multimodal data, which tests whether you can reason about the policy's representational capacity rather than just its loss. Strong candidates can sketch the theory (horizon-dependent error growth) and the engineering tradeoffs in the same answer.

Application focus

The same topic, tailored to the robot you're building. Your choice is remembered across the roadmap and every topic.

Select an application above.

At a glance

Expert demonstrationsBehavior cloning(supervised policy)Execution drifts tounseen states(covariate shift,compounding error)DAgger: query experton learner's visitedstates, aggregatedata, retraintraindeploycollect labelsretrain onaggregated data

How behavior cloning fails via covariate shift and how DAgger fixes it by relabeling the learner's own states.

What to study

  • โœ“Behavior cloning as supervised learning, and why covariate shift causes compounding errors that grow with the task horizon.
  • โœ“DAgger and interactive imitation: aggregating expert labels on the learner's own state distribution to break the i.i.d. assumption.
  • โœ“Action chunking and temporal ensembling (ACT) to reduce compounding error, and diffusion/flow policies for multimodal action distributions.
  • โœ“How imitation learning underpins modern VLA policies (OpenVLA, pi-0) and the practicalities of demonstration collection via teleoperation.

Study by time budget

Pick the path that fits the time you have before your interview.

  1. โœŽpi-0: Our First Generalist Policy (Physical Intelligence blog)โ†—ArticlePhysical Intelligenceยท ~25 min
  2. โ–ถCS285 Deep RL (Fall 2023): Imitation Learning / Supervised Learning of Behaviors lecturesโ†—VideoSergey Levine (UC Berkeley)ยท ~1 hr

Where to practice coding

Prerequisites

Practice questions (2)