Comfy UI

HunyuanVideo

Dec 20, 2024

Table of Contents

HunyuanVideo from @TXhunyuan, the groundbreaking 13B open-source video model, is now natively supported in ComfyUI! It is shipped with unified Image and video generative structure. Check out some of the amazing examples!

HunyuanVideo + ComfyUI Capabilities

1. Unified Image & Video Generation

A “Dual-stream to Single-stream” Transformer seamlessly integrates text and visuals, improving motion coherence, image quality, and alignment.

Thanks for reading ComfyUI Blog! Subscribe for free to receive updates and support my work.

2. Superior Text-Video Alignment

The MLLM text encoder surpasses CLIP and T5, excelling in instruction adherence, detail precision, and complex reasoning capabilities.

3. Efficient Video Compression

A custom 3D VAE encodes videos into a compact latent space, maintaining resolution and frame rate while significantly reducing token requirements.

4. Enhanced Prompt Control

The Prompt Rewrite model offers two modes:

Normal Mode: Refines the interpretation of user intent.
Master Mode: Enhances composition, lighting, and overall visual quality.

Text-to-video Example Workflow

You can seamlessly generate videos and still images with HunyuanVideo. Here’s how to get started:

Update to the latest version of ComfyUI.
Download the following model files:
- hunyuan_video_t2v_720p_bf16.safetensors → Place in ComfyUI/models/diffusion_models.
- clip_l.safetensors and llava_llama3_fp8_scaled.safetensors → Place in ComfyUI/models/text_encoders.
- hunyuan_video_vae_bf16.safetensors → Place in ComfyUI/models/vae.
Load the provided workflow JSON file into ComfyUI or drag and drop it into the interface.

2. Generate an Image Using the Same Workflow

This model can generate still images by setting the video length to 1.

To keep track of updates to the Hunyuan series of models, subscribe to our blog and example workflow page.

Enjoy your creation!

Examples

Prompt:

On a busy Tokyo street, the camera descends to show the vibrant city. Modern buildings and shops line the street, with a neon-lit convenience store. The shot moves to a vending machine, where the blogger selects a drink. As the bottle slides out, the blogger smiles, takes the drink, and walks down the street, passing cherry blossoms and neon lights, capturing the lively Tokyo atmosphere.

Result video:

Prompt:

The camera focuses on a colorful paper crane in mid-flight, its wings adorned with intricate feathers. The light reflects off the delicate craftsmanship, while in the distance, a mysterious paper forest adds depth and intrigue to the scene.

Result video:

ComfyUI Examples