Welcome to Fine-R1 👋, which is the first MLLM to surpass various strong CLIP-like models in fine-grained visual recognition.
-
StevenHH2000/Fine-R1-3B
Image-Text-to-Text • 4B • Updated • 25 • 2 -
StevenHH2000/Fine-R1-7B
Image-Text-to-Text • 8B • Updated • 7.45k • 2 -
StevenHH2000/Fine-R1-7B-Stage1
Image-Text-to-Text • 8B • Updated • 12 • 1 -
StevenHH2000/Fine-R1-3B-Stage1
Image-Text-to-Text • 4B • Updated • 17 • 1