Inference Setup for Two Arms
Bimanual inference runs a single policy network that outputs actions for both arms simultaneously. The observation-action loop runs at 50Hz — the same frequency as your training data — with both follower arms executing their respective action chunks in sync.
For the first evaluation run, allow the policy to execute without interruption unless a physical collision is imminent. Bimanual policies often produce unexpected motions in the first 1–2 episodes as they adapt to the real environment. Episodes 3–10 are the meaningful evaluation data. Note whether the policy consistently reaches the same phases of the task (approach, grasp, transfer, place, home) even when it ultimately fails — partial success is diagnostic information.
Bimanual Evaluation Protocol
Use a structured protocol. Informal evaluation — "it looks like it's working" — is unreliable for bimanual policies because partial successes are much more common and can mask a fundamentally broken handoff.
| Protocol Item | Bimanual Specification |
|---|---|
| Number of episodes | 10 minimum; 20 for high-confidence results before adding more data |
| Cube starting position | Fixed, tape-marked position — same as your Unit 4 training setup |
| Lighting | Must match training conditions. Even opening a window can shift lighting enough to affect the workspace camera |
| What counts as full success | Cube starts on right side, ends on left side, both arms return to home pose, no human contact during episode |
| What counts as partial success | Correct grasp achieved but transfer fails, or transfer succeeds but placement is off-target. Log these separately. |
| Failure classification | Log: (A) grasp failure, (B) handoff failure — arm-to-arm transfer drops, (C) placement failure, (D) timeout. The handoff failure category (B) is unique to bimanual and most informative for improvement. |
| Report metric | Full success rate (episodes with all 4 phases correct). Also report partial success rate. Example: "4/10 full, 7/10 reached handoff phase". |
Common Bimanual Failure Modes
These failure modes are distinct from single-arm failures and require bimanual-specific fixes:
- Arms arrive at handoff point asynchronously: One arm reaches the handoff position and waits; the other arrives late. The policy has not learned the relative timing between arms. Fix: add 20 demonstrations where both arms explicitly pause at the handoff point for 1–2 seconds before completing the transfer. This makes the synchronization requirement explicit in the data.
- Handoff drop — cube falls between the two arms: The most common bimanual-specific failure. The receiving arm closes its gripper too early or too late relative to the giving arm's release. Fix: collect 15 slow-motion handoff demonstrations specifically at 25% speed. The exaggerated timing gives the policy a clearer signal about the gripper state transition sequence.
- Policy converges on a single-arm strategy: The policy learns to complete the task with one arm only, ignoring the other arm's capabilities. This happens when one arm's demonstrations are more consistent than the other's. Fix: review each arm's action error from the training curves (Unit 5) and collect additional demos specifically targeting the weaker arm's phases.
- Inter-arm collision: Both arms attempt to occupy the same workspace location. This is a safety event — enable collision avoidance in the DK1 hardware server (
collision_avoidance: truein dk1-config.yaml) during evaluation. Training on demonstrations that consistently respect safe arm separation will prevent most collisions; the hardware-level guard handles edge cases. - Phase desynchronization at deployment: The policy executes the correct actions but not in the right temporal order — e.g., right arm places before left arm has transferred. This is an action chunking artifact where the chunk boundaries don't align with task phase transitions. Fix: reduce
chunk_sizefrom 100 to 50 and retrain.
The Data Flywheel for Bimanual Improvement
The same improvement loop that works for single-arm policies works for bimanual — with one bimanual-specific addition: always target the first failure mode in the task sequence. The handoff (phase B) cannot be improved if grasp (phase A) is still inconsistent. Fix failures in task sequence order.
Evaluate
Run 10 episodes. Classify each failure by phase (A/B/C/D)
Target
Identify the first failure phase. Collect 20–30 demos specifically covering that phase
Retrain
Add targeted demos to dataset. Retrain from scratch or fine-tune the best checkpoint
Evaluate
Run 10 episodes again. Did full success rate improve? Move to next failure phase.
What's Next
You now have a working bimanual learning pipeline. The cube handoff is the foundation — the same architecture scales to significantly more complex tasks:
Unit 6 Complete When...
Your DK1 completes the cube handoff task autonomously with a full success rate of at least 6/10 in a structured evaluation run. You have classified all failure episodes by phase (A/B/C/D) and identified which phase is responsible for most failures. You have watched the failure videos and can articulate specifically what went wrong. You understand the bimanual data flywheel well enough to plan your next improvement iteration.