Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

ASID-Caption

community
https://asid-caption.github.io/
Activity Feed

AI & ML interests

Video Understanding, Audio-Visual, Multimodal LLMs, Video Captioning, Instruction Tuning, Dataset Curation, Qwen-based, Open-source, Fully-Open-MLLMs

Recent Activity

lyhisme  submitted a paper 2 days ago
Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought
lyhisme  updated a model 15 days ago
AudioVisual-Caption/ASID-Captioner-7B
lyhisme  updated a model 15 days ago
AudioVisual-Caption/ASID-Captioner-3B
View all activity

Papers

Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions

View all Papers

Yunheng Li's profile picture

AudioVisual-Caption 's models 2

AudioVisual-Caption/ASID-Captioner-7B

Image-Text-to-Text • 9B • Updated 15 days ago • 181 • 5

AudioVisual-Caption/ASID-Captioner-3B

Image-Text-to-Text • 5B • Updated 15 days ago • 3.09k • 37
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs