HunyuanVideo-Foley / README.md
phazei's picture
Add FP8 quantized model
0873e19
---
base_model: tencent/HunyuanVideo-Foley
tags:
- quantized
- fp8
- audio-generation
- video-to-audio
- comfyui
library_name: transformers
---
# HunyuanVideo-Foley FP8 Quantized
This is an FP8 quantized version of [tencent/HunyuanVideo-Foley](https://huggingface.co/tencent/HunyuanVideo-Foley) optimized for reduced VRAM usage while maintaining audio generation quality.
## Quantization Details
- **Quantization Method**: FP8 E5M2 & E4M3FN weight-only quantization
- **Layers Quantized**: Transformer block weights only (attention and FFN layers)
- **Preserved Precision**: Normalization layers, embeddings, and biases remain in original precision
- **Expected VRAM Savings**: ~30-40% reduction compared to BF16 original
- **Memory Usage**: Enables running on <12GB GPUs when combined with other optimizations
## Usage
### ComfyUI (Recommended)
This model is specifically optimized for use with the [ComfyUI-HunyuanVideo-Foley](https://github.com/phazei/ComfyUI-HunyuanVideo-Foley) custom node, which provides:
- **VRAM-friendly loading** with ping-pong memory management
- **Built-in FP8 support** that automatically handles the quantized weights
- **Torch compile integration** for ~30% speed improvements after first run
- **Text-to-Audio and Video-to-Audio** modes
- **Batch generation** with audio selection tools
**Installation:**
1. Install the ComfyUI node: [ComfyUI-HunyuanVideo-Foley](https://github.com/phazei/ComfyUI-HunyuanVideo-Foley)
2. Download this quantized model to `ComfyUI/models/foley/`
3. Enjoy <8GB VRAM usage with high-quality audio generation
**Typical VRAM Usage (5s audio, 50 steps):**
- Baseline (BF16): ~10-12 GB
- With FP8 quantization: ~8-10 GB
- Perfect for RTX 3080/4070 Ti and similar GPUs
### Other Frameworks
The FP8 weights can be used with any framework that supports automatic upcasting of FP8 to FP16/BF16 during computation. The quantized weights maintain compatibility with the original model architecture.
## Files
- `hunyuanvideo_foley_fp8_e4m3fn.safetensors` - Main model weights in FP8 format
## Performance Notes
- **Quality**: Maintains comparable audio generation quality to the original model
- **Speed**: Conversion overhead is minimal; actual generation speed depends on compute precision
- **Memory**: Significant VRAM reduction makes the model accessible on consumer GPUs
- **Compatibility**: Drop-in replacement for the original model weights
## Original Model
This quantization is based on [tencent/HunyuanVideo-Foley](https://huggingface.co/tencent/HunyuanVideo-Foley). Please refer to the original repository for:
- Model architecture details
- Training information
- License terms
- Citation information
## Technical Details
The quantization uses a conservative approach that only converts transformer block weights while preserving precision-sensitive components:
-**Converted**: Attention and FFN layer weights in transformer blocks
-**Preserved**: Normalization layers, embeddings, projections, bias terms
This selective quantization strategy maintains model quality while maximizing memory savings.