Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.0.0
π Deployment Guide - VedaMD Enhanced
Pre-Deployment Checklist
Before deploying to production, ensure all items are completed:
Critical Security β
- Groq API key regenerated (old key removed)
- API key stored in HF Spaces secrets (not in code)
- CORS configuration restricted to known domains
- Input validation implemented
- Rate limiting enabled
- Prompt injection detection active
Code Quality β
- LLM model updated (llama-3.3-70b-versatile)
- Resource leaks fixed (httpx client cleanup)
- Test suite created and passing
- All tests passing locally
- Code reviewed
Documentation β
- SECURITY_SETUP.md created
- .env.example created
- Test documentation complete
- This deployment guide
Optional Improvements β οΈ
- Vector store rebuilt with Clinical ModernBERT (768d)
- Monitoring and observability setup
- CI/CD pipeline configured
- Performance benchmarks established
Deployment to Hugging Face Spaces
Step 1: Configure Secrets
- Go to your Hugging Face Space
- Click Settings tab
- Navigate to Repository secrets
- Add the following secrets:
| Secret Name | Description | Required |
|---|---|---|
GROQ_API_KEY |
Your Groq API key | Yes |
ALLOWED_ORIGINS |
Comma-separated allowed domains (optional) | No |
Example ALLOWED_ORIGINS:
https://your-space.hf.space,https://yourdomain.com
Step 2: Update Repository
Commit your changes:
git add . git commit -m "feat: Update to llama-3.3, add security features and tests"Push to Hugging Face Spaces:
git push origin main
Step 3: Verify Deployment
Check Build Logs:
- Go to your Space
- Click Logs tab
- Watch for successful initialization messages:
π₯ Initializing VedaMD Enhanced for Hugging Face Spaces... β Enhanced Medical RAG system ready!
Test the Application:
- Open your Space URL
- Try a test query: "What is preeclampsia?"
- Verify sources and citations appear
- Check response time (should be <10 seconds)
Monitor for Errors:
- Watch logs for any warnings or errors
- Check API key is loaded correctly
- Verify model is llama-3.3-70b-versatile
Step 4: Post-Deployment Validation
Run through this checklist:
- Application loads without errors
- Test queries return proper responses
- Citations are displayed correctly
- Medical verification is working
- Response times are acceptable (<10s)
- No API key errors in logs
- No resource leak warnings
Local Development Setup
Prerequisites
- Python 3.8+
- pip
- Git
- Groq API key
Installation
Clone the repository:
git clone <your-repo-url> cd "SL Clinical Assistant"Create virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activateInstall dependencies:
pip install -r requirements.txtConfigure environment:
cp .env.example .env # Edit .env and add your GROQ_API_KEYRun tests:
pip install pytest pytest-cov pytestStart application:
python app.pyAccess locally: Open browser to:
http://localhost:7860
Production Configuration
Environment Variables
| Variable | Description | Default | Required |
|---|---|---|---|
GROQ_API_KEY |
Groq API authentication key | - | Yes |
ALLOWED_ORIGINS |
CORS allowed origins (comma-separated) | localhost + netlify | No |
Resource Requirements
Minimum (Hugging Face Spaces):
- CPU: 2 vCPUs
- RAM: 8GB
- Storage: 5GB
- Python: 3.8+
Recommended:
- CPU: 4 vCPUs
- RAM: 16GB
- Storage: 10GB
Dependencies
Key dependencies and versions:
gradio==4.44.1 # Web interface
groq>=0.5.0 # LLM API client
sentence-transformers # Embeddings
torch>=2.0.0 # ML framework
faiss-cpu>=1.7.0 # Vector search
Full list in requirements.txt
Monitoring & Maintenance
Health Checks
Automated checks to implement:
- API endpoint availability
- Response time monitoring
- Error rate tracking
- API key validity
- Vector store accessibility
Logs to Monitor
Watch for these log patterns:
Success indicators:
β
Enhanced Medical RAG system ready!
β
HTTP client connection closed
β
Groq API connection successful
Warning indicators:
β οΈ CORS allows all origins (*)
β οΈ Error closing HTTP client
Error indicators:
β Failed to initialize system
β Groq API connection failed
β GROQ_API_KEY not found
Cost Monitoring
Groq API Usage:
- Track API calls per day
- Monitor token usage
- Set up billing alerts
Estimated costs (with llama-3.3-70b-versatile):
- Input: $0.59 per 1M tokens
- Output: $0.79 per 1M tokens
Average query: ~5,000 input + 500 output tokens Cost per query: ~$0.004
For 1,000 queries/day: ~$4/day = $120/month
Performance Metrics
Target metrics:
- Query latency: <10 seconds (p95)
- Availability: >99%
- Error rate: <1%
- Verification success: >95%
To track:
- Average response time
- Queries per hour
- Error types and frequency
- User satisfaction (if feedback enabled)
Troubleshooting
Common Issues
1. API Key Error
Symptom: GROQ_API_KEY not found in environment variables
Solution:
- Verify secret is set in HF Spaces Settings
- Restart the Space
- Check for typos in secret name
2. Model Deprecation Error
Symptom: Model not found or Invalid model ID
Solution:
- Code updated to use
llama-3.3-70b-versatile(production model) - If error persists, check Groq Model Documentation
3. Slow Response Times
Symptom: Queries taking >30 seconds
Possible causes:
- Vector store loading issue
- Network latency to Groq API
- Large number of concurrent requests
Solutions:
- Check Space resources
- Verify vector store is loaded correctly
- Consider increasing max_threads limit
4. Memory Errors
Symptom: Out of memory errors in logs
Solutions:
- Upgrade to larger Space tier
- Reduce max_threads in app.py
- Check for resource leaks (should be fixed)
5. CORS Errors (Frontend)
Symptom: Frontend can't connect to API
Solution:
- Add frontend domain to ALLOWED_ORIGINS
- Update
src/enhanced_backend_api.pyCORS settings
Rollback Procedure
If issues arise post-deployment:
Immediate rollback:
# Revert to previous commit git revert HEAD git push origin mainOr reset to specific commit:
git reset --hard <previous-working-commit> git push origin main --forceVerify rollback:
- Check Space rebuilds successfully
- Test with known good query
- Monitor logs for stability
Security Best Practices
API Key Management
- β Never commit API keys to git
- β Use HF Spaces secrets for production
- β Rotate keys every 90 days
- β Monitor API usage for anomalies
Input Sanitization
- β Max query length: 2000 characters
- β Prompt injection detection enabled
- β Empty query rejection
- β Special character handling
Access Control
- Consider adding authentication for production
- Rate limit per user/IP if possible
- Log all queries for audit purposes
- Implement usage quotas
Compliance
For medical applications:
- Ensure HIPAA compliance if handling PHI
- Implement audit logging
- Document data retention policies
- Review with legal/compliance team
Support & Escalation
Issue Priority Levels
P0 - Critical (Response: Immediate):
- Application down
- API key compromised
- Data breach
P1 - High (Response: <4 hours):
- Elevated error rates
- Slow response times
- Verification failures
P2 - Medium (Response: <24 hours):
- Minor bugs
- UI issues
- Non-critical errors
P3 - Low (Response: <1 week):
- Feature requests
- Documentation updates
- Performance optimizations
Escalation Path
- Check logs and error messages
- Review troubleshooting guide
- Check Groq API status
- Review recent code changes
- Escalate to development team
Maintenance Schedule
Daily
- Monitor error logs
- Check API usage/costs
- Verify application health
Weekly
- Review performance metrics
- Check for deprecated dependencies
- Backup configuration
Monthly
- Update dependencies
- Review security patches
- Analyze usage patterns
- Performance optimization review
Quarterly
- Rotate API keys
- Security audit
- Load testing
- Documentation update
Future Enhancements
Planned improvements (priority order):
Vector Store Rebuild (High Priority)
- Rebuild with full Clinical ModernBERT (768d)
- Expected: 10-15% accuracy improvement
Monitoring Dashboard (High Priority)
- Grafana/Prometheus integration
- Real-time metrics
- Alerting system
CI/CD Pipeline (Medium Priority)
- Automated testing
- Deployment automation
- Rollback capabilities
Multi-language Support (Medium Priority)
- Sinhala language support
- Tamil language support
- Translation pipeline
User Authentication (Low Priority)
- User accounts
- Usage tracking
- Personalized history
Version History
| Version | Date | Changes |
|---|---|---|
| 2.0.0 | 2025-10-22 | Security fixes, llama-3.3 update, test suite |
| 1.0.0 | 2025-XX-XX | Initial production deployment |
Contact & Resources
- Documentation: See README.md
- Security: See SECURITY_SETUP.md
- Tests: See tests/README.md
- Groq Docs: https://console.groq.com/docs
- HF Spaces: https://huggingface.co/docs/hub/spaces
Last Updated: 2025-10-22