Deep Learning (PyTorch)
CNNs, transformers, training โ modern perception & policies.
Why it matters in robotics
Modern robotics perception (object detection, segmentation, depth) and learned policies (imitation/RL, visuomotor and VLA models) are built on CNNs and transformers trained in PyTorch, so interviewers expect you to reason about architectures and training, not just call APIs. Expect questions on the training loop (forward, loss, backprop, optimizer step), why training diverges or overfits, and how convolution and self-attention extract spatial and sequential structure. Being able to whiteboard a network and debug a loss curve signals you can ship real perception and control models.
Application focus
The same topic, tailored to the robot you're building. Your choice is remembered across the roadmap and every topic.
At a glance
The core deep-learning training loop: data flows forward to a prediction, loss measures error, backprop computes gradients, and the optimizer updates weights โ repeated each batch.
What to study
- โCNN building blocks: convolution, pooling, receptive fields, and classic backbones (ResNet) for perception
- โTransformers and self-attention: Q/K/V, multi-head attention, positional encodings; intuition for ViT and policy transformers
- โThe training loop and optimization: autograd, loss functions, SGD/Adam, learning-rate schedules, batch norm
- โGeneralization and debugging: overfitting vs underfitting, regularization, data augmentation, and transfer learning / fine-tuning
Study by time budget
Pick the path that fits the time you have before your interview.