Why behavior cloning compounds errors and how DAgger helps

mediumsubjective

General

A team trains a manipulation policy with plain behavior cloning on expert teleoperation data. In simulation on held-out demonstration states the policy has low validation error, but when deployed on the robot it frequently drifts into awkward configurations it cannot recover from, and failures get worse the longer the episode runs. (a) Explain the mechanism behind this failure in terms of the i.i.d. assumption and covariate shift, and why the error tends to grow with the task horizon. (b) Describe how DAgger addresses it, what new cost it introduces, and one practical limitation of DAgger for real-robot tasks.

Your answer