https://paperswithcode.com/paper/streamingt2v-consistent-dynamic-and

Papers with Code - StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

Implemented in one code library.

paperswithcode.com

Long Video Generation from Text

Text-to-video diffusion models enable the generation of high quality videos that follow text instructions, making it easy to create diverse and individual content.

However, existing approaches mostly focus on high quality short video generation (typically 16 or 24 frames), ending up with hard-cuts when naively extended to the case of long video synthesis.

To overcome these limitations, we introduce StreamingT2V, an autoregressive approach for long video generation of 80, 240, 600, 1200 or more frames with smooth transitions.

Method overview: StreamingT2V extends a video diffusion model (VDM) by the conditional attention module (CAM) as short-term memory, and the appearance preservation module (APM) as long-term memory. CAM conditions the VDM on the previous chunk using a frame encoder . $E_{c o n d}$

The attentional mechanism of CAM leads to smooth transitions between chunks and videos with high motion amount at the same time. APM extracts from an anchor frame high-level image features and injects it to the text cross-attentions of the VDM. APM helps to preserve object/scene features across the autogressive video generation.

1200 FRAMES @ 2 MINUTES

https://streamingt2v.github.io/static/videos/1200/0005_0000_Wide_shot_of_battlefield,_stormtroopers_.mp4

"Wide shot of battlefield, stormtroopers running..."

600 FRAMES @ 1 MINUTE

https://streamingt2v.github.io/static/videos/1200/0004_0000_Experience_the_dance_of_jellyfish__float.mp4

https://streamingt2v.github.io/static/videos/600/0001_0000_Close_flyover_over_a_large_wheat_field_i.mp4

"Close flyover over a large wheat field..."

240 FRAMES @ 24 SECONDS

https://streamingt2v.github.io/static/videos/240/0009_0000_Santa_Claus_is_dancing.mp4

"Santa Claus is dancing."

80 FRAMES @ 8 SECONDS

https://streamingt2v.github.io/static/videos/80/0022_0000_A_squirrel_in_Antarctica,_on_a_pile_of_h.mp4

"A squirrel on a table full of big nuts."

Spaces - huggingface

https://huggingface.co/spaces/PAIR/StreamingT2V

StreamingT2V - a Hugging Face Space by PAIR

huggingface.co

저작자표시 비영리 변경금지

'IT > paper report' 카테고리의 다른 글

ORPO (0)	2024.05.16
Highly Articulated Gaussian Human Avatars with Textured Mesh Prior (0)	2024.04.25
Shap-E (0)	2024.04.03
Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance (0)	2024.03.28
Mixture-of-Experts (0)	2024.03.21

호치민에서 살아가기

StreamingT2V

Long Video Generation from Text

1200 FRAMES @ 2 MINUTES

600 FRAMES @ 1 MINUTE

240 FRAMES @ 24 SECONDS

80 FRAMES @ 8 SECONDS

'IT > paper report' 카테고리의 다른 글

티스토리툴바

StreamingT2V

Long Video Generation from Text

1200 FRAMES @ 2 MINUTES

600 FRAMES @ 1 MINUTE

240 FRAMES @ 24 SECONDS

80 FRAMES @ 8 SECONDS

'IT > paper report' 카테고리의 다른 글

'IT/paper report' Related Articles

티스토리툴바