# Chain-of-Zoom 4-bit Complete Pipeline Usage ## 🚀 Quick Start ```python # Install requirements pip install transformers accelerate bitsandbytes torch diffusers # Load VLM component from transformers import BitsAndBytesConfig, Qwen2VLForConditionalGeneration, Qwen2VLProcessor import torch bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16 ) # Load quantized VLM vlm_model = Qwen2VLForConditionalGeneration.from_pretrained( "humbleakh/qwen2.5-vl-3b-4bit-chain-of-zoom", quantization_config=bnb_config, device_map="auto", trust_remote_code=True ) vlm_processor = Qwen2VLProcessor.from_pretrained( "humbleakh/qwen2.5-vl-3b-4bit-chain-of-zoom", trust_remote_code=True ) # Load other components from their respective repos... ``` ## 📋 Components - **VLM**: [humbleakh/qwen2.5-vl-3b-4bit-chain-of-zoom](https://huggingface.co/humbleakh/qwen2.5-vl-3b-4bit-chain-of-zoom) - **Diffusion**: [humbleakh/stable-diffusion-3-4bit-chain-of-zoom](https://huggingface.co/humbleakh/stable-diffusion-3-4bit-chain-of-zoom) - **RAM**: [humbleakh/ram-swin-large-4bit-chain-of-zoom](https://huggingface.co/humbleakh/ram-swin-large-4bit-chain-of-zoom) ## 💾 Memory Usage - **Original**: ~12GB VRAM - **Quantized**: ~3GB VRAM - **Reduction**: 75% - **Compatible**: Google Colab T4 GPU ## 🎯 Implementation See the complete notebook for full Chain-of-Zoom implementation with quantized models.