phazei
/

HunyuanVideo-Foley

audio-generation

Model card Files Files and versions

HunyuanVideo-Foley / README.md

phazei's picture

Add FP8 quantized model

0873e19 3 months ago

|

history blame contribute delete

3.1 kB

	---
	base_model: tencent/HunyuanVideo-Foley
	tags:
	- quantized
	- fp8
	- audio-generation
	- video-to-audio
	- comfyui
	library_name: transformers
	---

	# HunyuanVideo-Foley FP8 Quantized

	This is an FP8 quantized version of [tencent/HunyuanVideo-Foley](https://huggingface.co/tencent/HunyuanVideo-Foley) optimized for reduced VRAM usage while maintaining audio generation quality.

	## Quantization Details

	- Quantization Method: FP8 E5M2 & E4M3FN weight-only quantization
	- Layers Quantized: Transformer block weights only (attention and FFN layers)
	- Preserved Precision: Normalization layers, embeddings, and biases remain in original precision
	- Expected VRAM Savings: ~30-40% reduction compared to BF16 original
	- Memory Usage: Enables running on <12GB GPUs when combined with other optimizations

	## Usage

	### ComfyUI (Recommended)

	This model is specifically optimized for use with the [ComfyUI-HunyuanVideo-Foley](https://github.com/phazei/ComfyUI-HunyuanVideo-Foley) custom node, which provides:

	- VRAM-friendly loading with ping-pong memory management
	- Built-in FP8 support that automatically handles the quantized weights
	- Torch compile integration for ~30% speed improvements after first run
	- Text-to-Audio and Video-to-Audio modes
	- Batch generation with audio selection tools

	Installation:
	1. Install the ComfyUI node: [ComfyUI-HunyuanVideo-Foley](https://github.com/phazei/ComfyUI-HunyuanVideo-Foley)
	2. Download this quantized model to `ComfyUI/models/foley/`
	3. Enjoy <8GB VRAM usage with high-quality audio generation

	Typical VRAM Usage (5s audio, 50 steps):
	- Baseline (BF16): ~10-12 GB
	- With FP8 quantization: ~8-10 GB
	- Perfect for RTX 3080/4070 Ti and similar GPUs

	### Other Frameworks

	The FP8 weights can be used with any framework that supports automatic upcasting of FP8 to FP16/BF16 during computation. The quantized weights maintain compatibility with the original model architecture.

	## Files

	- `hunyuanvideo_foley_fp8_e4m3fn.safetensors` - Main model weights in FP8 format

	## Performance Notes

	- Quality: Maintains comparable audio generation quality to the original model
	- Speed: Conversion overhead is minimal; actual generation speed depends on compute precision
	- Memory: Significant VRAM reduction makes the model accessible on consumer GPUs
	- Compatibility: Drop-in replacement for the original model weights

	## Original Model

	This quantization is based on [tencent/HunyuanVideo-Foley](https://huggingface.co/tencent/HunyuanVideo-Foley). Please refer to the original repository for:
	- Model architecture details
	- Training information
	- License terms
	- Citation information

	## Technical Details

	The quantization uses a conservative approach that only converts transformer block weights while preserving precision-sensitive components:
	- ✅ Converted: Attention and FFN layer weights in transformer blocks
	- ❌ Preserved: Normalization layers, embeddings, projections, bias terms

	This selective quantization strategy maintains model quality while maximizing memory savings.