arxiv:2601.03233

LTX-2: Efficient Joint Audio-Visual Foundation Model

Published on Jan 6

· Submitted by

taesiri on Jan 7

#1 Paper of the day

Upvote

172

Authors:

Yoav HaCohen ,

Benny Brazowski ,

Nisan Chiprut ,

Yaki Bitterman ,

Andrew Kvochko ,

Daphna Lifschitz ,

Dudu Moshe ,

Eitan Richardson ,

Guy Shiran ,

Itay Chachy ,

Michael Finkelson ,

Michael Kupchick ,

Nir Zabari ,

Nitzan Guetta ,

Ofir Bibi ,

Ori Gordon ,

Abstract

LTX-2 is an open-source audiovisual diffusion model that generates synchronized video and audio content using a dual-stream transformer architecture with cross-modal attention and classifier-free guidance.

AI-generated summary

Recent text-to-video diffusion models can generate compelling video sequences, yet they remain silent -- missing the semantic, emotional, and atmospheric cues that audio provides. We introduce LTX-2, an open-source foundational model capable of generating high-quality, temporally synchronized audiovisual content in a unified manner. LTX-2 consists of an asymmetric dual-stream transformer with a 14B-parameter video stream and a 5B-parameter audio stream, coupled through bidirectional audio-video cross-attention layers with temporal positional embeddings and cross-modality AdaLN for shared timestep conditioning. This architecture enables efficient training and inference of a unified audiovisual model while allocating more capacity for video generation than audio generation. We employ a multilingual text encoder for broader prompt understanding and introduce a modality-aware classifier-free guidance (modality-CFG) mechanism for improved audiovisual alignment and controllability. Beyond generating speech, LTX-2 produces rich, coherent audio tracks that follow the characters, environment, style, and emotion of each scene -- complete with natural background and foley elements. In our evaluations, the model achieves state-of-the-art audiovisual quality and prompt adherence among open-source systems, while delivering results comparable to proprietary models at a fraction of their computational cost and inference time. All model weights and code are publicly released.

View arXiv page View PDF Project page GitHub 5.22k Add to collection

Community

librarian-bot

Jan 8

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

pleezzz0

Jan 13

This comment has been hidden (marked as Off-Topic)

lucastraceup

Jan 21

putting this in the comment section by accident is wild lmao

efecelik

Jan 14

it's literally crazy.

Pamelagaldos

Feb 13

This comment has been hidden (marked as Off-Topic)

Kazanti

14 days ago

This comment has been hidden (marked as Graphic Content)

Haykkkoo

4 days ago

Haykkkoo

4 days ago

Создай видео по очереди (общий 12 картинок) сначала создай первый формат 18:9 ## ГОТОВЫЙ СЦЕНАРИЙ: "5 Countries That Could Disappear in Our Lifetime"

Хронометраж: ~1:45 – 2:00
Тон: Тревожный, но фактологический

ХУК (0:00 – 0:10)

Картинка: Карта мира, на ней начинают исчезать куски. Тревожная музыка.

Текст:
"You look at a world map and think it's permanent. It's not. Some countries we know today might not exist when you're old. Here are 5 nations fighting for survival right now."

МЕСТО №5: Мальдивы (Maldives)

Картинка: Райские острова, океан, волны, люди на пляже.

Текст:
"Number 5: The Maldives. The most beautiful islands in the Indian Ocean. Average height above sea level? Just 1.5 meters. Scientists say if sea levels keep rising, the Maldives could be underwater by the end of this century. The government is already buying land in other countries to move its people. A paradise that's disappearing."

МЕСТО №4: Тайвань (Taiwan)

Картинка: Карта, показывающая Тайвань рядом с Китаем, флаги.

Текст:
"Number 4: Taiwan. This is not about climate — it's about politics. Taiwan has been independent in practice for decades, but China claims it as its territory. Tensions are rising. If China decides to take control by force, Taiwan as an independent country could cease to exist."

МЕСТО №3: Кирибати (Kiribati)

Картинка: Тихий океан, маленькие острова, карта.

Текст:
"Number 3: Kiribati. A nation of 33 islands in the Pacific Ocean. Most of them are barely above water. Their president bought land in Fiji just to have somewhere to move when the ocean swallows them. They might be the first country to disappear completely. And it's happening now."

МЕСТО №2: Бангладеш (Bangladesh)

Картинка: Наводнения, люди по колено в воде, карта Бангладеш.

Текст:
"Number 2: Bangladesh. One of the most densely populated countries on Earth. 170 million people living on a giant river delta. Every year, floods get worse. By 2050, scientists predict 20% of the country could be underwater. That's 30 million climate refugees. One of the poorest nations could simply become unlivable."

МЕСТО №1: Тувалу (Tuvalu)

Картинка: Маленький остров посреди океана, волны, солнце.

Текст:
"Number 1: Tuvalu. A tiny island nation in the Pacific. The highest point is 4.5 meters above sea level. But when high tides come, the whole country floods. The government is building seawalls, but it might not be enough. Tuvalu could be the first country to lose its land completely. And the scariest part? It could happen in the next 30 years."

АУТРО (1:45 – 2:00)

Картинка: Карта мира с вопросом. Музыка становится тише.

Текст:
"Which of these countries would you save? Let me know in the comments. And if you want more geography and history — subscribe. The next video will be about what happens when a country disappears completely."