# ๐Ÿ—„๏ธ Persistent Storage Setup for HuggingFace Spaces ## ๐ŸŽฏ **Problem Solved: Model Storage** This setup prevents reloading models from the LinguaCustodia repository each time by using HuggingFace Spaces persistent storage. ## ๐Ÿ“‹ **Step-by-Step Setup** ### **1. Enable Persistent Storage in Your Space** 1. **Go to your Space**: https://huggingface.co/spaces/jeanbaptdzd/linguacustodia-financial-api 2. **Click "Settings" tab** 3. **Scroll to "Storage" section** 4. **Select a storage tier** (recommended: 1GB or 5GB) 5. **Click "Save"** ### **2. Update Your Space Files** Replace your current `app.py` with the persistent storage version: ```bash # Copy the persistent storage app cp persistent_storage_app.py app.py ``` ### **3. Key Changes Made** #### **Environment Variable Setup:** ```python # CRITICAL: Set HF_HOME to persistent storage directory os.environ["HF_HOME"] = "/data/.huggingface" ``` #### **Pipeline with Cache Directory:** ```python pipe = pipeline( "text-generation", model=model_id, token=hf_token_lc, dtype=torch_dtype, device_map="auto", trust_remote_code=True, # CRITICAL: Use persistent storage cache cache_dir=os.environ["HF_HOME"] ) ``` #### **Storage Monitoring:** ```python def get_storage_info() -> Dict[str, Any]: """Get information about persistent storage usage.""" # Returns storage status, cache size, writable status ``` ## ๐Ÿ”ง **How It Works** ### **First Load (Cold Start):** 1. Model downloads from LinguaCustodia repository 2. Model files cached to `/data/.huggingface/` 3. Takes ~2-3 minutes (same as before) ### **Subsequent Loads (Warm Start):** 1. Model loads from local cache (`/data/.huggingface/`) 2. **Much faster** - typically 30-60 seconds 3. No network download needed ## ๐Ÿ“Š **Storage Information** The app now provides storage information via `/health` endpoint: ```json { "status": "healthy", "model_loaded": true, "storage_info": { "hf_home": "/data/.huggingface", "data_dir_exists": true, "data_dir_writable": true, "hf_cache_dir_exists": true, "hf_cache_dir_writable": true, "cache_size_mb": 1234.5 } } ``` ## ๐Ÿš€ **Deployment Steps** ### **1. Update Space Files** ```bash # Upload these files to your Space: - app.py (use persistent_storage_app.py as base) - requirements.txt (same as before) - Dockerfile (same as before) - README.md (same as before) ``` ### **2. Enable Storage** - Go to Space Settings - Enable persistent storage (1GB minimum) - Save settings ### **3. Deploy** - Space will rebuild automatically - First load will be slow (downloading model) - Subsequent loads will be fast (using cache) ## ๐Ÿงช **Testing** ### **Test Storage Setup:** ```bash # Check health endpoint for storage info curl https://huggingface.co/spaces/jeanbaptdzd/linguacustodia-financial-api/health ``` ### **Test Model Loading Speed:** 1. **First request**: Will be slow (downloading model) 2. **Second request**: Should be much faster (using cache) ## ๐Ÿ’ก **Benefits** - โœ… **Faster startup** after first load - โœ… **Reduced bandwidth** usage - โœ… **Better reliability** (no network dependency for model loading) - โœ… **Cost savings** (faster inference = less compute time) - โœ… **Storage monitoring** (see cache size and status) ## ๐Ÿšจ **Important Notes** - **Storage costs**: ~$0.10/GB/month - **Cache size**: ~1-2GB for 8B models - **First load**: Still takes 2-3 minutes (downloading) - **Subsequent loads**: 30-60 seconds (from cache) ## ๐Ÿ”— **Files to Update** 1. **`app.py`** - Use `persistent_storage_app.py` as base 2. **Space Settings** - Enable persistent storage 3. **Test scripts** - Update URLs if needed --- **๐ŸŽฏ Result**: Models will be cached locally, dramatically reducing load times after the first deployment!