🤖 DreamZero-DROID: World Action Models are Zero-shot Policies

DreamZero-DROID is a 14B parameter World Action Model (WAM) checkpoint trained from scratch using only the DROID dataset. Unlike traditional Vision-Language-Action (VLA) models, DreamZero learns physical dynamics by predicting future world states and actions jointly, using video as a dense representation of how the world evolves.

This specific checkpoint demonstrates the strength of video-model backbones for generalist robot policies, achieving strong zero-shot performance on unseen tasks without requiring pretraining on massive, large-scale robot datasets.

🏗️ Model Details

Architecture: World Action Model (WAM) built upon a pretrained video diffusion backbone (Wan2.1-I2V-14B-480P).
Parameters: 14 Billion
Inputs: Visual context (via VAE), language instructions (via text encoder), and proprioceptive state.
Outputs: Joint autoregressive prediction of future video frames and robot actions.
Training Data: Trained exclusively on the DROID Dataset (Distributed Robot Interaction Dataset), utilizing ~75k episodes with language annotations.

🚀 Capabilities

Zero-Shot Generalization: Delivers over 2x improvement in generalization to new tasks and novel environments compared to state-of-the-art VLAs in real robot experiments.
Real-Time Execution: Through model and system optimizations (DreamZero-Flash), this 14B model is capable of real-time closed-loop control at ~7Hz.
Joint Video & Action: Learns diverse skills effectively from heterogeneous robot data without relying on highly repetitive demonstrations. Predicted actions closely align with the generated future video states.

💻 How to Use

To use this checkpoint for distributed inference or simulation evaluation, please refer to the main GitHub repository.

1. Download the checkpoint via Hugging Face CLI:

huggingface-cli download GEAR-Dreams/DreamZero-DROID --repo-type model --local-dir ./checkpoints/DreamZero-DROID

Downloads last month: 622

Safetensors

Model size

23B params

Tensor type

BF16

Video Preview

Robotics

Model tree for GEAR-Dreams/DreamZero-DROID

Base model

Wan-AI/Wan2.1-I2V-14B-480P

Quantized

(3)

this model

Dataset used to train GEAR-Dreams/DreamZero-DROID

Paper for GEAR-Dreams/DreamZero-DROID

World Action Models are Zero-shot Policies

Paper • 2602.15922 • Published Feb 17 • 15