World Models & Sim-to-Real

Why it matters in robotics

World models and sim-to-real sit at the heart of modern robot learning, so interviews probe whether you understand how to learn behaviors data-efficiently and then deploy them on hardware. Expect to explain why model-based RL (learning a latent dynamics model and planning or training a policy "in imagination," as in Dreamer/DreamerV3) buys sample efficiency over model-free RL, and what its failure modes are (compounding model error, policy exploitation of model inaccuracies). The sim-to-real "reality gap" is a perennial topic: you should be able to enumerate its sources (unmodeled dynamics, contact/friction, sensor noise, latency, visual appearance) and the toolkit to close it, especially domain randomization (Tobin et al.), domain adaptation, system identification, and differentiable simulation. Strong candidates can compare these strategies, reason about the robustness versus optimality trade-off of randomization, and connect them to concrete robot tasks like manipulation or locomotion. Practical questions also include how you would validate a policy before deployment and how you would diagnose a sim-trained policy that fails on the real robot.

Application focus

The same topic, tailored to the robot you're building. Your choice is remembered across the roadmap and every topic.

Select an application above.

At a glance

Dreamer-style model-based RL loop: a latent world model is learned from real interaction, the policy is trained on imagined latent rollouts, and the improved policy collects new real data.

What to study

✓Model-based RL and world models: latent dynamics models, learning and planning 'in imagination,' actor-critic on imagined rollouts (Dreamer/DreamerV3), and why this is more sample-efficient than model-free RL plus its failure modes (compounding error, model exploitation).
✓The sim-to-real reality gap: its sources (unmodeled contact/friction, mass/inertia errors, sensor noise, actuation latency, visual appearance) and how each degrades a transferred policy.
✓Domain randomization (Tobin et al.) for both visuals and dynamics, including the robustness-vs-optimality trade-off, and how guided/automatic randomization (e.g. SimOpt, ADR) improves on naive uniform randomization.
✓Complementary transfer techniques: domain adaptation, system identification (calibrating the simulator to the real robot), and differentiable simulation for gradient-based sim-to-real and parameter estimation.

Study by time budget

Pick the path that fits the time you have before your interview.

Where to practice coding

⌨ DreamerV3 reference implementation (danijar/dreamerv3) ↗

Prerequisites

Reinforcement Learning