Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks Paper β’ 2602.23898 β’ Published 27 days ago β’ 10
Fine-T2I: An Open, Large-Scale, and Diverse Dataset for High-Quality T2I Fine-Tuning Paper β’ 2602.09439 β’ Published Feb 10 β’ 13
VIDEOP2R: Video Understanding from Perception to Reasoning Paper β’ 2511.11113 β’ Published Nov 14, 2025 β’ 112
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper β’ 2510.23607 β’ Published Oct 27, 2025 β’ 181
Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to Multimodality Paper β’ 2505.18227 β’ Published May 23, 2025 β’ 15
DeepCritic: Deliberate Critique with Large Language Models Paper β’ 2505.00662 β’ Published May 1, 2025 β’ 54