# 🗄️ Persistent Storage Setup for HuggingFace Spaces

## 🎯 **Problem Solved: Model Storage**

This setup prevents reloading models from the LinguaCustodia repository each time by using HuggingFace Spaces persistent storage.

## 📋 **Step-by-Step Setup**

### **1. Enable Persistent Storage in Your Space**

1. **Go to your Space**: https://huggingface.co/spaces/jeanbaptdzd/linguacustodia-financial-api
2. **Click "Settings" tab**
3. **Scroll to "Storage" section**
4. **Select a storage tier** (recommended: 1GB or 5GB)
5. **Click "Save"**

### **2. Update Your Space Files**

Replace your current `app.py` with the persistent storage version:

```bash
# Copy the persistent storage app
cp persistent_storage_app.py app.py
```

### **3. Key Changes Made**

#### **Environment Variable Setup:**
```python
# CRITICAL: Set HF_HOME to persistent storage directory
os.environ["HF_HOME"] = "/data/.huggingface"
```

#### **Pipeline with Cache Directory:**
```python
pipe = pipeline(
    "text-generation",
    model=model_id,
    token=hf_token_lc,
    dtype=torch_dtype,
    device_map="auto",
    trust_remote_code=True,
    # CRITICAL: Use persistent storage cache
    cache_dir=os.environ["HF_HOME"]
)
```

#### **Storage Monitoring:**
```python
def get_storage_info() -> Dict[str, Any]:
    """Get information about persistent storage usage."""
    # Returns storage status, cache size, writable status
```

## 🔧 **How It Works**

### **First Load (Cold Start):**
1. Model downloads from LinguaCustodia repository
2. Model files cached to `/data/.huggingface/`
3. Takes ~2-3 minutes (same as before)

### **Subsequent Loads (Warm Start):**
1. Model loads from local cache (`/data/.huggingface/`)
2. **Much faster** - typically 30-60 seconds
3. No network download needed

## 📊 **Storage Information**

The app now provides storage information via `/health` endpoint:

```json
{
  "status": "healthy",
  "model_loaded": true,
  "storage_info": {
    "hf_home": "/data/.huggingface",
    "data_dir_exists": true,
    "data_dir_writable": true,
    "hf_cache_dir_exists": true,
    "hf_cache_dir_writable": true,
    "cache_size_mb": 1234.5
  }
}
```

## 🚀 **Deployment Steps**

### **1. Update Space Files**
```bash
# Upload these files to your Space:
- app.py (use persistent_storage_app.py as base)
- requirements.txt (same as before)
- Dockerfile (same as before)
- README.md (same as before)
```

### **2. Enable Storage**
- Go to Space Settings
- Enable persistent storage (1GB minimum)
- Save settings

### **3. Deploy**
- Space will rebuild automatically
- First load will be slow (downloading model)
- Subsequent loads will be fast (using cache)

## 🧪 **Testing**

### **Test Storage Setup:**
```bash
# Check health endpoint for storage info
curl https://huggingface.co/spaces/jeanbaptdzd/linguacustodia-financial-api/health
```

### **Test Model Loading Speed:**
1. **First request**: Will be slow (downloading model)
2. **Second request**: Should be much faster (using cache)

## 💡 **Benefits**

- ✅ **Faster startup** after first load
- ✅ **Reduced bandwidth** usage
- ✅ **Better reliability** (no network dependency for model loading)
- ✅ **Cost savings** (faster inference = less compute time)
- ✅ **Storage monitoring** (see cache size and status)

## 🚨 **Important Notes**

- **Storage costs**: ~$0.10/GB/month
- **Cache size**: ~1-2GB for 8B models
- **First load**: Still takes 2-3 minutes (downloading)
- **Subsequent loads**: 30-60 seconds (from cache)

## 🔗 **Files to Update**

1. **`app.py`** - Use `persistent_storage_app.py` as base
2. **Space Settings** - Enable persistent storage
3. **Test scripts** - Update URLs if needed

---

**🎯 Result**: Models will be cached locally, dramatically reducing load times after the first deployment!