πŸ€– DreamZero-DROID: World Action Models are Zero-shot Policies

arXiv GitHub Project Page

DreamZero-DROID is a 14B parameter World Action Model (WAM) checkpoint trained from scratch using only the DROID dataset. Unlike traditional Vision-Language-Action (VLA) models, DreamZero learns physical dynamics by predicting future world states and actions jointly, using video as a dense representation of how the world evolves.

This specific checkpoint demonstrates the strength of video-model backbones for generalist robot policies, achieving strong zero-shot performance on unseen tasks without requiring pretraining on massive, large-scale robot datasets.

πŸ—οΈ Model Details

  • Architecture: World Action Model (WAM) built upon a pretrained video diffusion backbone (Wan2.1-I2V-14B-480P).
  • Parameters: 14 Billion
  • Inputs: Visual context (via VAE), language instructions (via text encoder), and proprioceptive state.
  • Outputs: Joint autoregressive prediction of future video frames and robot actions.
  • Training Data: Trained exclusively on the DROID Dataset (Distributed Robot Interaction Dataset), utilizing ~75k episodes with language annotations.

πŸš€ Capabilities

  • Zero-Shot Generalization: Delivers over 2x improvement in generalization to new tasks and novel environments compared to state-of-the-art VLAs in real robot experiments.
  • Real-Time Execution: Through model and system optimizations (DreamZero-Flash), this 14B model is capable of real-time closed-loop control at ~7Hz.
  • Joint Video & Action: Learns diverse skills effectively from heterogeneous robot data without relying on highly repetitive demonstrations. Predicted actions closely align with the generated future video states.

πŸ’» How to Use

To use this checkpoint for distributed inference or simulation evaluation, please refer to the main GitHub repository.

1. Download the checkpoint via Hugging Face CLI:

huggingface-cli download GEAR-Dreams/DreamZero-DROID --repo-type model --local-dir ./checkpoints/DreamZero-DROID
Downloads last month
622
Safetensors
Model size
23B params
Tensor type
BF16
Β·
Video Preview
loading

Model tree for GEAR-Dreams/DreamZero-DROID

Quantized
(3)
this model

Dataset used to train GEAR-Dreams/DreamZero-DROID

Paper for GEAR-Dreams/DreamZero-DROID