VedaMD-Backend-v2 / DEPLOYMENT.md
sniro23's picture
Production ready: Clean codebase + Cerebras + Automated pipeline
b4971bd

A newer version of the Gradio SDK is available: 6.0.0

Upgrade

πŸš€ Deployment Guide - VedaMD Enhanced

Pre-Deployment Checklist

Before deploying to production, ensure all items are completed:

Critical Security βœ…

  • Groq API key regenerated (old key removed)
  • API key stored in HF Spaces secrets (not in code)
  • CORS configuration restricted to known domains
  • Input validation implemented
  • Rate limiting enabled
  • Prompt injection detection active

Code Quality βœ…

  • LLM model updated (llama-3.3-70b-versatile)
  • Resource leaks fixed (httpx client cleanup)
  • Test suite created and passing
  • All tests passing locally
  • Code reviewed

Documentation βœ…

  • SECURITY_SETUP.md created
  • .env.example created
  • Test documentation complete
  • This deployment guide

Optional Improvements ⚠️

  • Vector store rebuilt with Clinical ModernBERT (768d)
  • Monitoring and observability setup
  • CI/CD pipeline configured
  • Performance benchmarks established

Deployment to Hugging Face Spaces

Step 1: Configure Secrets

  1. Go to your Hugging Face Space
  2. Click Settings tab
  3. Navigate to Repository secrets
  4. Add the following secrets:
Secret Name Description Required
GROQ_API_KEY Your Groq API key Yes
ALLOWED_ORIGINS Comma-separated allowed domains (optional) No

Example ALLOWED_ORIGINS:

https://your-space.hf.space,https://yourdomain.com

Step 2: Update Repository

  1. Commit your changes:

    git add .
    git commit -m "feat: Update to llama-3.3, add security features and tests"
    
  2. Push to Hugging Face Spaces:

    git push origin main
    

Step 3: Verify Deployment

  1. Check Build Logs:

    • Go to your Space
    • Click Logs tab
    • Watch for successful initialization messages:
      πŸ₯ Initializing VedaMD Enhanced for Hugging Face Spaces...
      βœ… Enhanced Medical RAG system ready!
      
  2. Test the Application:

    • Open your Space URL
    • Try a test query: "What is preeclampsia?"
    • Verify sources and citations appear
    • Check response time (should be <10 seconds)
  3. Monitor for Errors:

    • Watch logs for any warnings or errors
    • Check API key is loaded correctly
    • Verify model is llama-3.3-70b-versatile

Step 4: Post-Deployment Validation

Run through this checklist:

  • Application loads without errors
  • Test queries return proper responses
  • Citations are displayed correctly
  • Medical verification is working
  • Response times are acceptable (<10s)
  • No API key errors in logs
  • No resource leak warnings

Local Development Setup

Prerequisites

  • Python 3.8+
  • pip
  • Git
  • Groq API key

Installation

  1. Clone the repository:

    git clone <your-repo-url>
    cd "SL Clinical Assistant"
    
  2. Create virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Configure environment:

    cp .env.example .env
    # Edit .env and add your GROQ_API_KEY
    
  5. Run tests:

    pip install pytest pytest-cov
    pytest
    
  6. Start application:

    python app.py
    
  7. Access locally: Open browser to: http://localhost:7860


Production Configuration

Environment Variables

Variable Description Default Required
GROQ_API_KEY Groq API authentication key - Yes
ALLOWED_ORIGINS CORS allowed origins (comma-separated) localhost + netlify No

Resource Requirements

Minimum (Hugging Face Spaces):

  • CPU: 2 vCPUs
  • RAM: 8GB
  • Storage: 5GB
  • Python: 3.8+

Recommended:

  • CPU: 4 vCPUs
  • RAM: 16GB
  • Storage: 10GB

Dependencies

Key dependencies and versions:

gradio==4.44.1         # Web interface
groq>=0.5.0            # LLM API client
sentence-transformers  # Embeddings
torch>=2.0.0          # ML framework
faiss-cpu>=1.7.0      # Vector search

Full list in requirements.txt


Monitoring & Maintenance

Health Checks

Automated checks to implement:

  1. API endpoint availability
  2. Response time monitoring
  3. Error rate tracking
  4. API key validity
  5. Vector store accessibility

Logs to Monitor

Watch for these log patterns:

Success indicators:

βœ… Enhanced Medical RAG system ready!
βœ… HTTP client connection closed
βœ… Groq API connection successful

Warning indicators:

⚠️ CORS allows all origins (*)
⚠️ Error closing HTTP client

Error indicators:

❌ Failed to initialize system
❌ Groq API connection failed
❌ GROQ_API_KEY not found

Cost Monitoring

Groq API Usage:

  • Track API calls per day
  • Monitor token usage
  • Set up billing alerts

Estimated costs (with llama-3.3-70b-versatile):

  • Input: $0.59 per 1M tokens
  • Output: $0.79 per 1M tokens

Average query: ~5,000 input + 500 output tokens Cost per query: ~$0.004

For 1,000 queries/day: ~$4/day = $120/month

Performance Metrics

Target metrics:

  • Query latency: <10 seconds (p95)
  • Availability: >99%
  • Error rate: <1%
  • Verification success: >95%

To track:

  • Average response time
  • Queries per hour
  • Error types and frequency
  • User satisfaction (if feedback enabled)

Troubleshooting

Common Issues

1. API Key Error

Symptom: GROQ_API_KEY not found in environment variables

Solution:

  1. Verify secret is set in HF Spaces Settings
  2. Restart the Space
  3. Check for typos in secret name

2. Model Deprecation Error

Symptom: Model not found or Invalid model ID

Solution:

3. Slow Response Times

Symptom: Queries taking >30 seconds

Possible causes:

  1. Vector store loading issue
  2. Network latency to Groq API
  3. Large number of concurrent requests

Solutions:

  • Check Space resources
  • Verify vector store is loaded correctly
  • Consider increasing max_threads limit

4. Memory Errors

Symptom: Out of memory errors in logs

Solutions:

  1. Upgrade to larger Space tier
  2. Reduce max_threads in app.py
  3. Check for resource leaks (should be fixed)

5. CORS Errors (Frontend)

Symptom: Frontend can't connect to API

Solution:

  • Add frontend domain to ALLOWED_ORIGINS
  • Update src/enhanced_backend_api.py CORS settings

Rollback Procedure

If issues arise post-deployment:

  1. Immediate rollback:

    # Revert to previous commit
    git revert HEAD
    git push origin main
    
  2. Or reset to specific commit:

    git reset --hard <previous-working-commit>
    git push origin main --force
    
  3. Verify rollback:

    • Check Space rebuilds successfully
    • Test with known good query
    • Monitor logs for stability

Security Best Practices

API Key Management

  • βœ… Never commit API keys to git
  • βœ… Use HF Spaces secrets for production
  • βœ… Rotate keys every 90 days
  • βœ… Monitor API usage for anomalies

Input Sanitization

  • βœ… Max query length: 2000 characters
  • βœ… Prompt injection detection enabled
  • βœ… Empty query rejection
  • βœ… Special character handling

Access Control

  • Consider adding authentication for production
  • Rate limit per user/IP if possible
  • Log all queries for audit purposes
  • Implement usage quotas

Compliance

For medical applications:

  • Ensure HIPAA compliance if handling PHI
  • Implement audit logging
  • Document data retention policies
  • Review with legal/compliance team

Support & Escalation

Issue Priority Levels

P0 - Critical (Response: Immediate):

  • Application down
  • API key compromised
  • Data breach

P1 - High (Response: <4 hours):

  • Elevated error rates
  • Slow response times
  • Verification failures

P2 - Medium (Response: <24 hours):

  • Minor bugs
  • UI issues
  • Non-critical errors

P3 - Low (Response: <1 week):

  • Feature requests
  • Documentation updates
  • Performance optimizations

Escalation Path

  1. Check logs and error messages
  2. Review troubleshooting guide
  3. Check Groq API status
  4. Review recent code changes
  5. Escalate to development team

Maintenance Schedule

Daily

  • Monitor error logs
  • Check API usage/costs
  • Verify application health

Weekly

  • Review performance metrics
  • Check for deprecated dependencies
  • Backup configuration

Monthly

  • Update dependencies
  • Review security patches
  • Analyze usage patterns
  • Performance optimization review

Quarterly

  • Rotate API keys
  • Security audit
  • Load testing
  • Documentation update

Future Enhancements

Planned improvements (priority order):

  1. Vector Store Rebuild (High Priority)

    • Rebuild with full Clinical ModernBERT (768d)
    • Expected: 10-15% accuracy improvement
  2. Monitoring Dashboard (High Priority)

    • Grafana/Prometheus integration
    • Real-time metrics
    • Alerting system
  3. CI/CD Pipeline (Medium Priority)

    • Automated testing
    • Deployment automation
    • Rollback capabilities
  4. Multi-language Support (Medium Priority)

    • Sinhala language support
    • Tamil language support
    • Translation pipeline
  5. User Authentication (Low Priority)

    • User accounts
    • Usage tracking
    • Personalized history

Version History

Version Date Changes
2.0.0 2025-10-22 Security fixes, llama-3.3 update, test suite
1.0.0 2025-XX-XX Initial production deployment

Contact & Resources

Last Updated: 2025-10-22