Spaces:

sniro23
/

VedaMD-Backend-v2

Sleeping

App Files Files Community

VedaMD-Backend-v2 / DEPLOYMENT.md

sniro23

Production ready: Clean codebase + Cerebras + Automated pipeline

b4971bd about 1 month ago

preview code

raw

history blame contribute delete

10.1 kB

A newer version of the Gradio SDK is available: 6.0.0

Upgrade

🚀 Deployment Guide - VedaMD Enhanced

Pre-Deployment Checklist

Before deploying to production, ensure all items are completed:

Critical Security ✅

Groq API key regenerated (old key removed)
API key stored in HF Spaces secrets (not in code)
CORS configuration restricted to known domains
Input validation implemented
Rate limiting enabled
Prompt injection detection active

Code Quality ✅

LLM model updated (llama-3.3-70b-versatile)
Resource leaks fixed (httpx client cleanup)
Test suite created and passing
All tests passing locally
Code reviewed

Documentation ✅

SECURITY_SETUP.md created
.env.example created
Test documentation complete
This deployment guide

Optional Improvements ⚠️

Vector store rebuilt with Clinical ModernBERT (768d)
Monitoring and observability setup
CI/CD pipeline configured
Performance benchmarks established

Deployment to Hugging Face Spaces

Step 1: Configure Secrets

Go to your Hugging Face Space
Click Settings tab
Navigate to Repository secrets
Add the following secrets:

Secret Name	Description	Required
`GROQ_API_KEY`	Your Groq API key	Yes
`ALLOWED_ORIGINS`	Comma-separated allowed domains (optional)	No

Example ALLOWED_ORIGINS:

https://your-space.hf.space,https://yourdomain.com

Step 2: Update Repository

Commit your changes:

git add .
git commit -m "feat: Update to llama-3.3, add security features and tests"

Push to Hugging Face Spaces:
```
git push origin main
```

Step 3: Verify Deployment

Check Build Logs:

Go to your Space
Click Logs tab

Watch for successful initialization messages:

🏥 Initializing VedaMD Enhanced for Hugging Face Spaces...
✅ Enhanced Medical RAG system ready!

Test the Application:
- Open your Space URL
- Try a test query: "What is preeclampsia?"
- Verify sources and citations appear
- Check response time (should be <10 seconds)
Monitor for Errors:
- Watch logs for any warnings or errors
- Check API key is loaded correctly
- Verify model is llama-3.3-70b-versatile

Step 4: Post-Deployment Validation

Run through this checklist:

Application loads without errors
Test queries return proper responses
Citations are displayed correctly
Medical verification is working
Response times are acceptable (<10s)
No API key errors in logs
No resource leak warnings

Local Development Setup

Prerequisites

Python 3.8+
pip
Git
Groq API key

Installation

Clone the repository:

git clone <your-repo-url>
cd "SL Clinical Assistant"

Create virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Configure environment:

cp .env.example .env
# Edit .env and add your GROQ_API_KEY

Run tests:
```
pip install pytest pytest-cov
pytest
```
Start application:
```
python app.py
```
Access locally: Open browser to: http://localhost:7860

Production Configuration

Environment Variables

Variable	Description	Default	Required
`GROQ_API_KEY`	Groq API authentication key	-	Yes
`ALLOWED_ORIGINS`	CORS allowed origins (comma-separated)	localhost + netlify	No

Resource Requirements

Minimum (Hugging Face Spaces):

CPU: 2 vCPUs
RAM: 8GB
Storage: 5GB
Python: 3.8+

Recommended:

CPU: 4 vCPUs
RAM: 16GB
Storage: 10GB

Dependencies

Key dependencies and versions:

gradio==4.44.1         # Web interface
groq>=0.5.0            # LLM API client
sentence-transformers  # Embeddings
torch>=2.0.0          # ML framework
faiss-cpu>=1.7.0      # Vector search

Full list in requirements.txt

Monitoring & Maintenance

Health Checks

Automated checks to implement:

API endpoint availability
Response time monitoring
Error rate tracking
API key validity
Vector store accessibility

Logs to Monitor

Watch for these log patterns:

Success indicators:

✅ Enhanced Medical RAG system ready!
✅ HTTP client connection closed
✅ Groq API connection successful

Warning indicators:

⚠️ CORS allows all origins (*)
⚠️ Error closing HTTP client

Error indicators:

❌ Failed to initialize system
❌ Groq API connection failed
❌ GROQ_API_KEY not found

Cost Monitoring

Groq API Usage:

Track API calls per day
Monitor token usage
Set up billing alerts

Estimated costs (with llama-3.3-70b-versatile):

Input: $0.59 per 1M tokens
Output: $0.79 per 1M tokens

Average query: ~5,000 input + 500 output tokens Cost per query: ~$0.004

For 1,000 queries/day: ~$4/day = $120/month

Performance Metrics

Target metrics:

Query latency: <10 seconds (p95)
Availability: >99%
Error rate: <1%
Verification success: >95%

To track:

Average response time
Queries per hour
Error types and frequency
User satisfaction (if feedback enabled)

Troubleshooting

Common Issues

1. API Key Error

Symptom: GROQ_API_KEY not found in environment variables

Solution:

Verify secret is set in HF Spaces Settings
Restart the Space
Check for typos in secret name

2. Model Deprecation Error

Symptom: Model not found or Invalid model ID

Solution:

Code updated to use llama-3.3-70b-versatile (production model)
If error persists, check Groq Model Documentation

3. Slow Response Times

Symptom: Queries taking >30 seconds

Possible causes:

Vector store loading issue
Network latency to Groq API
Large number of concurrent requests

Solutions:

Check Space resources
Verify vector store is loaded correctly
Consider increasing max_threads limit

4. Memory Errors

Symptom: Out of memory errors in logs

Solutions:

Upgrade to larger Space tier
Reduce max_threads in app.py
Check for resource leaks (should be fixed)

5. CORS Errors (Frontend)

Symptom: Frontend can't connect to API

Solution:

Add frontend domain to ALLOWED_ORIGINS
Update src/enhanced_backend_api.py CORS settings

Rollback Procedure

If issues arise post-deployment:

Immediate rollback:

# Revert to previous commit
git revert HEAD
git push origin main

Or reset to specific commit:

git reset --hard <previous-working-commit>
git push origin main --force

Verify rollback:
- Check Space rebuilds successfully
- Test with known good query
- Monitor logs for stability

Security Best Practices

API Key Management

✅ Never commit API keys to git
✅ Use HF Spaces secrets for production
✅ Rotate keys every 90 days
✅ Monitor API usage for anomalies

Input Sanitization

✅ Max query length: 2000 characters
✅ Prompt injection detection enabled
✅ Empty query rejection
✅ Special character handling

Access Control

Consider adding authentication for production
Rate limit per user/IP if possible
Log all queries for audit purposes
Implement usage quotas

Compliance

For medical applications:

Ensure HIPAA compliance if handling PHI
Implement audit logging
Document data retention policies
Review with legal/compliance team

Support & Escalation

Issue Priority Levels

P0 - Critical (Response: Immediate):

Application down
API key compromised
Data breach

P1 - High (Response: <4 hours):

Elevated error rates
Slow response times
Verification failures

P2 - Medium (Response: <24 hours):

Minor bugs
UI issues
Non-critical errors

P3 - Low (Response: <1 week):

Feature requests
Documentation updates
Performance optimizations

Escalation Path

Check logs and error messages
Review troubleshooting guide
Check Groq API status
Review recent code changes
Escalate to development team

Maintenance Schedule

Daily

Monitor error logs
Check API usage/costs
Verify application health

Weekly

Review performance metrics
Check for deprecated dependencies
Backup configuration

Monthly

Update dependencies
Review security patches
Analyze usage patterns
Performance optimization review

Quarterly

Rotate API keys
Security audit
Load testing
Documentation update

Future Enhancements

Planned improvements (priority order):

Vector Store Rebuild (High Priority)
- Rebuild with full Clinical ModernBERT (768d)
- Expected: 10-15% accuracy improvement
Monitoring Dashboard (High Priority)
- Grafana/Prometheus integration
- Real-time metrics
- Alerting system
CI/CD Pipeline (Medium Priority)
- Automated testing
- Deployment automation
- Rollback capabilities
Multi-language Support (Medium Priority)
- Sinhala language support
- Tamil language support
- Translation pipeline
User Authentication (Low Priority)
- User accounts
- Usage tracking
- Personalized history

Version History

Version	Date	Changes
2.0.0	2025-10-22	Security fixes, llama-3.3 update, test suite
1.0.0	2025-XX-XX	Initial production deployment

Contact & Resources

Documentation: See README.md
Security: See SECURITY_SETUP.md
Tests: See tests/README.md
Groq Docs: https://console.groq.com/docs
HF Spaces: https://huggingface.co/docs/hub/spaces

Last Updated: 2025-10-22