# 🚀 Deployment Guide - VedaMD Enhanced ## Pre-Deployment Checklist Before deploying to production, ensure all items are completed: ### Critical Security ✅ - [x] Groq API key regenerated (old key removed) - [x] API key stored in HF Spaces secrets (not in code) - [x] CORS configuration restricted to known domains - [x] Input validation implemented - [x] Rate limiting enabled - [x] Prompt injection detection active ### Code Quality ✅ - [x] LLM model updated (llama-3.3-70b-versatile) - [x] Resource leaks fixed (httpx client cleanup) - [x] Test suite created and passing - [ ] All tests passing locally - [ ] Code reviewed ### Documentation ✅ - [x] SECURITY_SETUP.md created - [x] .env.example created - [x] Test documentation complete - [x] This deployment guide ### Optional Improvements ⚠️ - [ ] Vector store rebuilt with Clinical ModernBERT (768d) - [ ] Monitoring and observability setup - [ ] CI/CD pipeline configured - [ ] Performance benchmarks established --- ## Deployment to Hugging Face Spaces ### Step 1: Configure Secrets 1. Go to your Hugging Face Space 2. Click **Settings** tab 3. Navigate to **Repository secrets** 4. Add the following secrets: | Secret Name | Description | Required | |-------------|-------------|----------| | `GROQ_API_KEY` | Your Groq API key | Yes | | `ALLOWED_ORIGINS` | Comma-separated allowed domains (optional) | No | Example ALLOWED_ORIGINS: ``` https://your-space.hf.space,https://yourdomain.com ``` ### Step 2: Update Repository 1. **Commit your changes**: ```bash git add . git commit -m "feat: Update to llama-3.3, add security features and tests" ``` 2. **Push to Hugging Face Spaces**: ```bash git push origin main ``` ### Step 3: Verify Deployment 1. **Check Build Logs**: - Go to your Space - Click **Logs** tab - Watch for successful initialization messages: ``` 🏥 Initializing VedaMD Enhanced for Hugging Face Spaces... ✅ Enhanced Medical RAG system ready! ``` 2. **Test the Application**: - Open your Space URL - Try a test query: "What is preeclampsia?" - Verify sources and citations appear - Check response time (should be <10 seconds) 3. **Monitor for Errors**: - Watch logs for any warnings or errors - Check API key is loaded correctly - Verify model is llama-3.3-70b-versatile ### Step 4: Post-Deployment Validation Run through this checklist: - [ ] Application loads without errors - [ ] Test queries return proper responses - [ ] Citations are displayed correctly - [ ] Medical verification is working - [ ] Response times are acceptable (<10s) - [ ] No API key errors in logs - [ ] No resource leak warnings --- ## Local Development Setup ### Prerequisites - Python 3.8+ - pip - Git - Groq API key ### Installation 1. **Clone the repository**: ```bash git clone cd "SL Clinical Assistant" ``` 2. **Create virtual environment**: ```bash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate ``` 3. **Install dependencies**: ```bash pip install -r requirements.txt ``` 4. **Configure environment**: ```bash cp .env.example .env # Edit .env and add your GROQ_API_KEY ``` 5. **Run tests**: ```bash pip install pytest pytest-cov pytest ``` 6. **Start application**: ```bash python app.py ``` 7. **Access locally**: Open browser to: `http://localhost:7860` --- ## Production Configuration ### Environment Variables | Variable | Description | Default | Required | |----------|-------------|---------|----------| | `GROQ_API_KEY` | Groq API authentication key | - | Yes | | `ALLOWED_ORIGINS` | CORS allowed origins (comma-separated) | localhost + netlify | No | ### Resource Requirements **Minimum (Hugging Face Spaces)**: - CPU: 2 vCPUs - RAM: 8GB - Storage: 5GB - Python: 3.8+ **Recommended**: - CPU: 4 vCPUs - RAM: 16GB - Storage: 10GB ### Dependencies Key dependencies and versions: ``` gradio==4.44.1 # Web interface groq>=0.5.0 # LLM API client sentence-transformers # Embeddings torch>=2.0.0 # ML framework faiss-cpu>=1.7.0 # Vector search ``` Full list in `requirements.txt` --- ## Monitoring & Maintenance ### Health Checks **Automated checks to implement**: 1. API endpoint availability 2. Response time monitoring 3. Error rate tracking 4. API key validity 5. Vector store accessibility ### Logs to Monitor Watch for these log patterns: **Success indicators**: ``` ✅ Enhanced Medical RAG system ready! ✅ HTTP client connection closed ✅ Groq API connection successful ``` **Warning indicators**: ``` ⚠️ CORS allows all origins (*) ⚠️ Error closing HTTP client ``` **Error indicators**: ``` ❌ Failed to initialize system ❌ Groq API connection failed ❌ GROQ_API_KEY not found ``` ### Cost Monitoring **Groq API Usage**: - Track API calls per day - Monitor token usage - Set up billing alerts **Estimated costs** (with llama-3.3-70b-versatile): - Input: $0.59 per 1M tokens - Output: $0.79 per 1M tokens Average query: ~5,000 input + 500 output tokens **Cost per query**: ~$0.004 For 1,000 queries/day: ~$4/day = $120/month ### Performance Metrics **Target metrics**: - Query latency: <10 seconds (p95) - Availability: >99% - Error rate: <1% - Verification success: >95% **To track**: - Average response time - Queries per hour - Error types and frequency - User satisfaction (if feedback enabled) --- ## Troubleshooting ### Common Issues #### 1. API Key Error **Symptom**: `GROQ_API_KEY not found in environment variables` **Solution**: 1. Verify secret is set in HF Spaces Settings 2. Restart the Space 3. Check for typos in secret name #### 2. Model Deprecation Error **Symptom**: `Model not found` or `Invalid model ID` **Solution**: - Code updated to use `llama-3.3-70b-versatile` (production model) - If error persists, check [Groq Model Documentation](https://console.groq.com/docs/models) #### 3. Slow Response Times **Symptom**: Queries taking >30 seconds **Possible causes**: 1. Vector store loading issue 2. Network latency to Groq API 3. Large number of concurrent requests **Solutions**: - Check Space resources - Verify vector store is loaded correctly - Consider increasing max_threads limit #### 4. Memory Errors **Symptom**: Out of memory errors in logs **Solutions**: 1. Upgrade to larger Space tier 2. Reduce max_threads in app.py 3. Check for resource leaks (should be fixed) #### 5. CORS Errors (Frontend) **Symptom**: Frontend can't connect to API **Solution**: - Add frontend domain to ALLOWED_ORIGINS - Update `src/enhanced_backend_api.py` CORS settings --- ## Rollback Procedure If issues arise post-deployment: 1. **Immediate rollback**: ```bash # Revert to previous commit git revert HEAD git push origin main ``` 2. **Or reset to specific commit**: ```bash git reset --hard git push origin main --force ``` 3. **Verify rollback**: - Check Space rebuilds successfully - Test with known good query - Monitor logs for stability --- ## Security Best Practices ### API Key Management - ✅ Never commit API keys to git - ✅ Use HF Spaces secrets for production - ✅ Rotate keys every 90 days - ✅ Monitor API usage for anomalies ### Input Sanitization - ✅ Max query length: 2000 characters - ✅ Prompt injection detection enabled - ✅ Empty query rejection - ✅ Special character handling ### Access Control - Consider adding authentication for production - Rate limit per user/IP if possible - Log all queries for audit purposes - Implement usage quotas ### Compliance For medical applications: - Ensure HIPAA compliance if handling PHI - Implement audit logging - Document data retention policies - Review with legal/compliance team --- ## Support & Escalation ### Issue Priority Levels **P0 - Critical** (Response: Immediate): - Application down - API key compromised - Data breach **P1 - High** (Response: <4 hours): - Elevated error rates - Slow response times - Verification failures **P2 - Medium** (Response: <24 hours): - Minor bugs - UI issues - Non-critical errors **P3 - Low** (Response: <1 week): - Feature requests - Documentation updates - Performance optimizations ### Escalation Path 1. Check logs and error messages 2. Review troubleshooting guide 3. Check Groq API status 4. Review recent code changes 5. Escalate to development team --- ## Maintenance Schedule ### Daily - Monitor error logs - Check API usage/costs - Verify application health ### Weekly - Review performance metrics - Check for deprecated dependencies - Backup configuration ### Monthly - Update dependencies - Review security patches - Analyze usage patterns - Performance optimization review ### Quarterly - Rotate API keys - Security audit - Load testing - Documentation update --- ## Future Enhancements Planned improvements (priority order): 1. **Vector Store Rebuild** (High Priority) - Rebuild with full Clinical ModernBERT (768d) - Expected: 10-15% accuracy improvement 2. **Monitoring Dashboard** (High Priority) - Grafana/Prometheus integration - Real-time metrics - Alerting system 3. **CI/CD Pipeline** (Medium Priority) - Automated testing - Deployment automation - Rollback capabilities 4. **Multi-language Support** (Medium Priority) - Sinhala language support - Tamil language support - Translation pipeline 5. **User Authentication** (Low Priority) - User accounts - Usage tracking - Personalized history --- ## Version History | Version | Date | Changes | |---------|------|---------| | 2.0.0 | 2025-10-22 | Security fixes, llama-3.3 update, test suite | | 1.0.0 | 2025-XX-XX | Initial production deployment | --- ## Contact & Resources - **Documentation**: See README.md - **Security**: See SECURITY_SETUP.md - **Tests**: See tests/README.md - **Groq Docs**: https://console.groq.com/docs - **HF Spaces**: https://huggingface.co/docs/hub/spaces **Last Updated**: 2025-10-22