Why latent imagination is sample-efficient

mediumsubjectivesystem design

General

A robotics team is training a vision-based manipulation policy and is comparing a model-free RL agent (e.g. a model-free actor-critic) against a Dreamer-style model-based agent that learns a latent world model and trains its actor-critic on imagined rollouts. Explain why the world-model approach is typically far more sample-efficient on the *real robot*, what the world model actually predicts, and describe two failure modes of training a policy purely 'in imagination.' How would you mitigate them?

Your answer