What Is Robot Teleoperation?
Robot teleoperation is the real-time remote control of a robot by a human operator. The operator provides continuous or near-continuous commands (position targets, velocity targets, or force targets) through an input device, and the robot executes those commands as faithfully as its hardware and control system allow.
Teleoperation has existed since the 1950s — the first telemanipulators were used for handling radioactive materials at Argonne National Laboratory. What changed in the 2020s is the application: teleoperation is now primarily used to generate training data for robot learning algorithms. When a human teleoperates a robot to complete a task, the resulting sensor data and action commands become demonstration data that can train an imitation learning policy to perform the same task autonomously.
This dual-use nature — both a practical remote-control method and a data generation tool — makes teleoperation one of the most important capabilities in modern robotics. The choice of teleoperation method directly determines the quality of collected data, the types of tasks you can demonstrate, and the throughput of your data collection pipeline.
Applications of Teleoperation
- Demonstration data collection: The dominant application in 2026. Operators teleoperate robots to perform manipulation tasks, generating datasets of (observation, action) pairs used to train policies via behavior cloning, ACT, Diffusion Policy, or VLA models. See our data collection guide for the full pipeline.
- Remote inspection and maintenance: Operating robots in hazardous environments (nuclear facilities, deep sea, space, disaster zones) where human presence is dangerous or impossible. Latency tolerance is higher; data quality requirements are lower.
- Telesurgery: The da Vinci Surgical System is the most commercially successful teleoperation platform in history, with over 7 million procedures performed. Surgical teleoperation demands the lowest latency (<5 ms) and highest precision of any application.
- Remote assembly and manufacturing: Operators in one location control robots in a factory thousands of miles away. Emerging use case driven by labor shortages and the COVID-era need for contactless operations.
Teleoperation Methods Compared
VR Headset Teleoperation (Meta Quest 3)
The operator wears a VR headset (Meta Quest 3 is the current standard at $500) and uses hand controllers or hand tracking to specify desired end-effector positions in Cartesian space. The robot's inverse kinematics solver computes the joint angles needed to reach the commanded position.
How It Works in Detail
The Quest 3 streams controller 6-DOF pose data (position + orientation) at 72 Hz over WiFi to a workstation running the IK server. The IK server (typically using PyBullet, MuJoCo, or a custom IK solver) converts the end-effector target to joint positions. These joint positions are sent to the robot's motor controller via the robot's native communication protocol (USB, Ethernet, ROS2). The operator sees a live camera feed from the robot's workspace in the headset, providing visual feedback for closed-loop control.
Pros
- Lowest hardware cost ($500) — the VR headset is the only additional hardware needed
- Intuitive for operators with VR experience — the mapping from hand position to robot position is natural
- Works with any robot arm that accepts Cartesian position commands
- Single-operator bimanual control using both controllers simultaneously
- Portable — the headset is not mechanically coupled to the robot, so the operator can be in a different room
Cons
- Higher latency (15-40 ms total) due to WiFi transmission and IK computation
- No proprioceptive feedback — the operator cannot feel arm resistance, joint limits, or contact forces
- VR-induced nausea limits session duration to 60-90 minutes for many operators
- Lower precision for contact-rich tasks — without force feedback, operators overshoot insertion and alignment tasks
- IK singularities cause jerky motion when the arm is near full extension or wrist lock configurations
Leader-Follower Arms
The operator physically holds and moves a lightweight "leader" arm. A heavier "follower" arm tracks the leader's joint positions in real time. This is the approach used by ALOHA, Mobile ALOHA, and the majority of serious data collection setups in 2026.
How It Works in Detail
Both leader and follower arms use the same servo family (e.g., Dynamixel XM/XH series) connected via shared USB bus to a single computer. The computer reads the leader's joint positions at 50-200 Hz and sends them as position targets to the follower. Because the mapping is joint-to-joint (no IK computation), latency is 3-8 ms — fast enough to feel transparent to the operator.
The leader arm is typically smaller and lighter than the follower (e.g., WidowX-250 leader with ViperX-300 follower). Gravity compensation on the leader makes it effortless to hold, allowing extended collection sessions. The kinematic similarity between leader and follower ensures that the operator's proprioception — the sense of where their hand is in space — maps accurately to the follower's workspace.
Pros
- Lowest latency (3-8 ms) — the operator feels directly connected to the robot
- Proprioceptive feedback — the operator feels the leader arm's position, which maps to the follower
- Highest data quality for joint-space tasks — no IK artifacts, no singularity issues
- Highest throughput (20-35 demonstrations per hour) due to natural, fast operator movements
- Well-proven: the ACT, Diffusion Policy, and LeRobot codebases all natively support leader-follower data
Cons
- Additional hardware cost ($3,000-$8,000 for the leader arm)
- The operator must be physically co-located with the leader arm (no remote operation)
- Kinematic differences between leader and follower (different link lengths, joint limits) cause workspace mismatches
- Leader arm requires gravity compensation tuning for operator comfort — poor tuning causes fatigue
Exoskeleton and Haptic Gloves
The operator wears a hand exoskeleton or haptic glove that tracks finger joint angles and optionally provides force feedback. Finger positions are mapped to a dexterous robot hand (5-finger, 15+ DOF).
Pros
- Essential for dexterous manipulation tasks — no other method provides finger-level control
- Haptic feedback (on high-end gloves) lets operators feel grasped objects
- Natural hand control interface — operators use their existing fine motor skills
Cons
- High cost ($8,000-$20,000 per pair for research-grade gloves)
- Physically demanding — operators fatigue after 45-60 minutes
- Calibration required for each operator's hand geometry
- Only useful when paired with a dexterous robot hand (additional $8,000-$220,000)
- Slower throughput (8-15 demonstrations per hour) because dexterous tasks are inherently slower
Joystick and Gamepad
The operator uses joystick axes (typically a 3DConnexion SpaceMouse or dual-analog gamepad) to control end-effector velocity or joint velocities. Each joystick axis maps to one degree of freedom.
Pros
- Cheapest option ($30-$300)
- No VR sickness, no physical exertion — operators can work long sessions
- Good for mobile robot navigation and camera positioning
Cons
- Lowest control bandwidth — a 6-DOF arm requires more axes than most controllers provide
- Unnatural interface — operators must mentally map joystick axes to robot axes, which is slow and error-prone
- Jerky trajectories due to velocity control (vs. position control in other methods)
- Lowest throughput (5-12 demonstrations per hour)
- Data quality is measurably worse — policies trained on joystick data consistently underperform leader-follower or VR data
| Method | Latency | Hardware Cost | Demos/Hour | Data Quality | Best For |
|---|---|---|---|---|---|
| VR Headset | 15-40 ms | $500 | 15-25 | Good | Budget teams, rapid prototyping |
| Leader-Follower | 3-8 ms | $3K-$8K | 20-35 | Highest | Serious data collection |
| Exo Gloves | 5-15 ms | $8K-$20K | 8-15 | High (dexterous) | Dexterous manipulation |
| Joystick | 1-5 ms | $30-$300 | 5-12 | Lower | Navigation, quick prototyping |
Latency: The Core Challenge
Latency — the delay between the operator's command and the robot's response — is the single most important performance metric for any teleoperation system. High latency degrades operator performance, reduces data quality, and makes contact-rich tasks impossible.
Latency Budget Breakdown
Total teleoperation latency is the sum of several pipeline stages:
- Input device to computer: 1-15 ms. USB HID (joystick, SpaceMouse): 1-2 ms. WiFi (VR headset): 5-15 ms. Bluetooth (some gamepads): 10-30 ms — avoid Bluetooth for teleoperation.
- Computation (IK, safety checks): 1-10 ms. Joint-space leader-follower: <1 ms (direct copy). Cartesian IK with collision checking: 3-10 ms.
- Computer to robot actuators: 2-15 ms. Direct USB/EtherCAT: 2-5 ms. ROS2 topic via DDS: 5-15 ms depending on QoS settings and network.
- Actuator response: 5-20 ms. Motor controller PID settling time. Faster for small motions, slower for large commands.
Acceptable Latency Thresholds
- <10 ms: Feels instantaneous. Required for surgical teleoperation. Achievable with leader-follower on shared USB bus.
- 10-30 ms: Feels responsive. Adequate for most manipulation tasks. Achievable with VR on local WiFi.
- 30-100 ms: Noticeable delay. Operators compensate with slower, more deliberate movements. Data quality degrades for fast tasks. Still usable for slow manipulation and navigation.
- >100 ms: Significantly impairs operator performance. Move-and-wait strategy required. Only acceptable for non-time-critical tasks like remote inspection.
How to Minimize Latency
- Use wired connections (USB, Ethernet) instead of WiFi where possible
- Run the control loop on a real-time OS (Ubuntu with PREEMPT_RT kernel) at dedicated CPU cores
- Minimize ROS2 middleware overhead by using intra-process communication for co-located nodes
- Use direct motor communication (Dynamixel Protocol 2.0 bulk read/write) instead of ROS2 topics for the leader-follower link
- Profile your pipeline end-to-end — measure actual latency, do not estimate. A single poorly configured network buffer can add 20+ ms
Teleoperation for Data Collection vs. Real Deployment
The requirements for teleoperation differ dramatically depending on whether you are collecting training data or operating the robot for productive work.
Data Collection Teleoperation
- Priority: Data quality and throughput. Every episode needs to be a clean, successful demonstration.
- Latency tolerance: Moderate. 10-30 ms is acceptable because the operator is in a controlled lab environment with the robot in direct view.
- Session duration: 1-3 hours per session, multiple sessions per day. Operator comfort and fatigue management are critical.
- Failure handling: Failed demonstrations are discarded. The cost of a failed episode is only the operator's time (30-90 seconds). Fail fast, reset, repeat.
- Network: Local network only. Data collection teleoperation should never run over the internet — the latency and reliability penalties are not worth it.
Deployment Teleoperation
- Priority: Reliability and safety. The robot is performing real work — failures have consequences (damaged goods, safety incidents, downtime).
- Latency tolerance: Low. For remote deployment over WAN, latency of 50-200 ms is common and must be managed with predictive control, shared autonomy, and move-and-wait interfaces.
- Session duration: 8-12 hour shifts for industrial teleoperation. Requires ergonomic workstations, multiple operators in rotation, and fatigue monitoring.
- Failure handling: Must have safe fallback behaviors (stop, retreat to safe pose, alert supervisor). The robot should never enter an unrecoverable state during teleoperation.
- Network: Often remote (WAN/internet). Requires redundant connectivity, graceful degradation on packet loss, and security (encrypted connections, authentication).
SVRC Teleoperation Platform
The SVRC data platform provides an integrated teleoperation system designed for large-scale data collection campaigns.
- Multi-interface support: VR (Meta Quest 3), leader-follower (OpenArm, ViperX), and SpaceMouse — switch between methods per task
- Real-time quality monitoring: Live dashboard showing episode success rate, trajectory smoothness, collection speed, and operator performance metrics
- Automatic data pipeline: Episodes are recorded in HDF5, quality-checked, and exported to LeRobot format. No manual file management
- Multi-station coordination: Manage 1-8 collection stations from a single dashboard with centralized dataset management
- Operator management: Track individual operator throughput and quality metrics over time. Identify operators who need additional training
How to Set Up Your First Teleoperation Session
This step-by-step guide assumes you have a robot arm and want to start collecting demonstration data.
Step 1: Choose your teleoperation method
For most teams starting out, leader-follower is the best choice if you can afford a leader arm ($3,000-$8,000). If budget is tight, start with VR ($500). If your task requires dexterous manipulation, you need gloves. Review the comparison above to match your task requirements.
Step 2: Set up the hardware
Mount the robot arm on a stable surface (heavy-duty table, minimum 30 kg, bolted or clamped). Position 2-3 cameras: one overhead (150 cm height, 45-degree angle), one wrist-mounted, and optionally one at table height for side view. Connect everything to a single workstation (Ubuntu 22.04, ROS2 Humble recommended). Verify all sensor streams are publishing and synchronized.
Step 3: Install teleoperation software
For OpenArm: use the svrc-teleop package which supports VR and leader-follower out of the box. For other arms: the LeRobot framework provides teleoperation recording for WidowX, ViperX, and Koch arms. For UR5e: AnyTeleop or the Universal Robots external control interface with a custom IK bridge.
Step 4: Calibrate and test
Run the teleoperation in test mode (no recording). Verify: the robot follows your commands smoothly, camera views cover the full task workspace, latency feels acceptable, joint limits do not interfere with task motions. Adjust camera positions, tune IK parameters, and verify emergency stop functionality.
Step 5: Define your task protocol
Before collecting any data, write down: the task goal (specific, unambiguous), the starting configuration (object positions, arm pose), the expected approach strategy, success and failure criteria. Having this protocol before collection ensures consistency across operators and sessions.
Step 6: Collect a small test batch
Record 20-30 demonstrations. Train a simple policy (ACT with default hyperparameters). Evaluate on the real robot. If success rate is below 50%, the issue is almost certainly data quality, not model capacity. Review the demonstrations, identify inconsistencies, refine the protocol, and collect again. See our OpenArm learning path for a complete walkthrough.
Scaling Teleoperation Operations
Moving from a single research station to a multi-station data collection operation requires investment in standardization, quality control, and operator management.
Station Standardization
Every collection station must produce physically identical data. This means: identical arm models with matching firmware versions, identical camera models at identical positions (use a mounting jig for camera placement), identical lighting conditions (use controlled overhead lighting, not ambient), identical software versions frozen at the start of the campaign.
Operator Training Pipeline
New operators need 4-8 hours of training before collecting production data. Training sequence: 1) Learn the teleoperation interface with free-form practice (1 hour). 2) Practice the specific task with immediate feedback (2 hours). 3) Collect a calibration batch of 30 episodes, evaluated against quality criteria (1 hour). 4) Shadow an experienced operator for one production session (2 hours). Only operators who meet quality thresholds on the calibration batch enter the production rotation.
Quality Control System
Implement three layers of quality control: 1) Real-time: automated checks during collection (episode length bounds, joint limit violations, camera frame drops). 2) Post-episode: human or automated review within 24 hours (success verification, strategy consistency). 3) Pre-training: dataset-level analysis (distribution coverage, outlier detection, inter-operator consistency). Reject episodes that fail any layer. A 5% rejection rate is healthy; above 15% indicates a systemic issue with the protocol or operator training.