Foundation Models
Large pretrained VLMs/LLMs as the robot's semantic brain.
Why it matters in robotics
Foundation models are reshaping robotics: vision-language-action (VLA) models like RT-2, OpenVLA, and pi-0 now turn internet-pretrained transformers into generalist robot policies, so interviewers increasingly probe whether candidates understand this shift. Expect questions on how transformers, pretraining, and scaling laws produce emergent abilities, and how contrastive vision-language pretraining (CLIP) gives robots open-vocabulary grounding. A very common theme is adaptation: when to fine-tune fully vs. prompt vs. use LoRA/adapters, and why robotics teams co-train on web plus robot data rather than robot data alone. Candidates are also expected to reason crisply about limitations -- hallucination, weak physical grounding, inference latency vs. control rate, and the chronic scarcity of robot data. Strong answers connect the ML mechanism to a concrete robotics deployment consequence.
Application focus
The same topic, tailored to the robot you're building. Your choice is remembered across the roadmap and every topic.
At a glance
Adapting an internet-pretrained vision-language model into a robot vision-language-action (VLA) policy.
What to study
- โTransformer pretraining and scaling: self-attention, next-token objective, scaling laws (loss as a power law in params/data/compute), and emergent abilities that appear only at scale.
- โVision-language grounding: contrastive image-text pretraining (CLIP), VLM architectures, and how open-vocabulary visual features enable semantic generalization in robots.
- โAdapting pretrained models to robots: building VLAs by co-training on web + robot data, discrete action tokenization (RT-2, OpenVLA) vs. flow/diffusion action experts (pi-0), and parameter-efficient fine-tuning with LoRA/adapters vs. full fine-tuning vs. prompting.
- โDeployment limitations: inference latency vs. control frequency, hallucination and weak physical grounding, robot data scarcity, and safety/uncertainty considerations for closed-loop control.
Study by time budget
Pick the path that fits the time you have before your interview.