Show, Don't Tell: Morphing Latent Reasoning into Image Generation Paper β’ 2602.02227 β’ Published Feb 2 β’ 10
A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning Paper β’ 2512.14442 β’ Published Dec 16, 2025 β’ 11
UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving Paper β’ 2512.09864 β’ Published Dec 10, 2025 β’ 12
DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation Paper β’ 2511.23127 β’ Published Nov 28, 2025 β’ 44
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models Paper β’ 2511.13704 β’ Published Nov 17, 2025 β’ 44
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs Paper β’ 2510.09507 β’ Published Oct 10, 2025 β’ 11
Visual Representation Alignment for Multimodal Large Language Models Paper β’ 2509.07979 β’ Published Sep 9, 2025 β’ 84
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model Paper β’ 2509.00676 β’ Published Aug 31, 2025 β’ 85
A Survey of Reinforcement Learning for Large Reasoning Models Paper β’ 2509.08827 β’ Published Sep 10, 2025 β’ 193
MolmoAct: Action Reasoning Models that can Reason in Space Paper β’ 2508.07917 β’ Published Aug 11, 2025 β’ 45
Emerging Properties in Unified Multimodal Pretraining Paper β’ 2505.14683 β’ Published May 20, 2025 β’ 134
Running on Zero Featured 9.42k FLUX.1 [dev] π₯ 9.42k Generate images from text prompts with FLUX.1 diffusion model