Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.0.1
VedaMD Project Structure
Clean, organized codebase for production deployment
Last updated: October 23, 2025
Directory Structure
SL Clinical Assistant/
βββ app.py # Gradio interface (HF Spaces entry point)
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variable template
βββ .gitignore # Git ignore rules
β
βββ src/ # Core application code
β βββ __init__.py
β βββ enhanced_groq_medical_rag.py # Main RAG system (Cerebras-powered)
β βββ enhanced_backend_api.py # FastAPI backend for frontend
β βββ simple_vector_store.py # Vector store loader
β βββ vector_store_compatibility.py # Compatibility wrapper (temporary)
β βββ enhanced_medical_context.py # Medical context enhancement
β βββ medical_response_verifier.py # Response verification & safety
β
βββ scripts/ # Automation scripts
β βββ build_vector_store.py # Build complete vector store from PDFs
β βββ add_document.py # Add single document incrementally
β
βββ frontend/ # Next.js frontend (separate deployment)
β βββ src/
β β βββ app/
β β βββ components/
β β βββ lib/
β β βββ api.ts # API client (FastAPI + Gradio support)
β βββ public/
β βββ package.json
β βββ .env.local.example
β
βββ data/ # Data files (local only, not in git)
β βββ guidelines/ # Source PDF files (moved from Obs/)
β βββ vector_store/ # Built vector store (FAISS + metadata)
β β βββ faiss_index.bin
β β βββ documents.json
β β βββ metadata.json
β β βββ config.json
β β βββ backups/ # Automatic backups
β βββ processed/ # Processed documents (optional)
β
βββ docs/ # Documentation index
β βββ README.md # Documentation directory index
β
βββ archive/ # Old/deprecated files (not in git)
β βββ old_scripts/ # batch_ocr_pipeline.py, convert_pdf.py
β βββ old_docs/ # output.md, cleanup_plan.md, etc.
β
βββ test_pdfs/ # Test files (not in git)
βββ test_vector_store/ # Test vector store (not in git)
β
βββ Documentation Files # Root-level docs
βββ README.md # Main project README
βββ PIPELINE_GUIDE.md # Document pipeline usage guide
βββ LOCAL_TESTING_GUIDE.md # Local development guide
βββ IMPROVEMENT_PLAN.md # Project roadmap
βββ DEPLOYMENT.md # Deployment instructions
βββ SECURITY_SETUP.md # Security configuration
βββ CEREBRAS_MIGRATION_GUIDE.md # Cerebras migration details
βββ QUICK_START_CEREBRAS.md # Cerebras quickstart
βββ PRODUCTION_READINESS_REPORT.md # Production assessment
βββ CHANGES_SUMMARY.md # Summary of changes
βββ CEREBRAS_SUMMARY.md # Cerebras integration summary
Core Files
Application Entry Points
| File | Purpose | Deployment |
|---|---|---|
app.py |
Gradio interface | Hugging Face Spaces |
src/enhanced_backend_api.py |
FastAPI REST API | Hugging Face Spaces (port 7862) |
frontend/ |
Next.js frontend | Netlify / Vercel |
RAG System
| File | Purpose | Key Features |
|---|---|---|
src/enhanced_groq_medical_rag.py |
Main RAG orchestrator | Cerebras integration, multi-stage retrieval, medical safety |
src/simple_vector_store.py |
Vector store loader | HF Hub download, FAISS search |
src/enhanced_medical_context.py |
Medical context enhancement | Entity extraction, relevance scoring |
src/medical_response_verifier.py |
Response verification | Claim validation, source traceability |
Automation Scripts
| Script | Purpose | Usage |
|---|---|---|
scripts/build_vector_store.py |
Build complete vector store | python scripts/build_vector_store.py --input-dir ./data/guidelines --output-dir ./data/vector_store --upload |
scripts/add_document.py |
Add single document | python scripts/add_document.py --file new.pdf --vector-store-dir ./data/vector_store --upload |
Startup Scripts
| Script | Purpose |
|---|---|
run_backend.sh |
Start FastAPI backend (port 7862) |
run_frontend.sh |
Start Next.js frontend (port 3000) |
kill_backend.sh |
Stop backend processes |
Data Files
Vector Store Files (data/vector_store/)
Generated by build_vector_store.py:
| File | Purpose | Format |
|---|---|---|
faiss_index.bin |
FAISS vector index | Binary |
documents.json |
Document chunks | JSON array of strings |
metadata.json |
Document metadata | JSON array of objects |
config.json |
Build configuration | JSON object |
build_log.json |
Build information | JSON object |
Metadata Structure:
{
"source": "guideline.pdf",
"section": "Management",
"chunk_id": 0,
"chunk_size": 1000,
"file_hash": "a3f2c9d8...",
"extraction_method": "pymupdf",
"total_pages": 15,
"citation": "SLCOG Guidelines 2025",
"category": "Obstetrics",
"processed_at": "2025-10-23T15:08:30.273544"
}
Configuration Files
Environment Variables
.env (local development):
CEREBRAS_API_KEY=csk_your_key_here
HF_TOKEN=hf_your_token_here # For uploading vector store
Hugging Face Spaces Secrets:
CEREBRAS_API_KEY # Required
HF_TOKEN # Optional (for vector store upload)
ALLOWED_ORIGINS # Optional (CORS, comma-separated)
Requirements
requirements.txt - Python dependencies:
- cerebras-cloud-sdk - Cerebras API client
- gradio - Web interface
- fastapi - REST API
- sentence-transformers - Embeddings
- faiss-cpu - Vector search
- huggingface-hub - Model/data hosting
- PyMuPDF, pdfplumber - PDF extraction
Git Ignore Strategy
Ignored (Local Only)
data/guidelines/- Source PDFsdata/vector_store/- Built vector storearchive/- Old filestest_pdfs/,test_vector_store/- Test filesfrontend/- Separate deployment.env- Local environment variables*.log- Log files
Committed (Version Control)
src/- Application codescripts/- Automation scriptsapp.py- Gradio entry pointrequirements.txt- Dependencies.env.example- Environment template*.md- Documentation
Workflow
Development Workflow
Add new guideline:
cp ~/Downloads/new_guideline.pdf data/guidelines/Update vector store:
python scripts/add_document.py \ --file data/guidelines/new_guideline.pdf \ --citation "SLCOG Guidelines 2025" \ --vector-store-dir ./data/vector_storeTest locally:
# Terminal 1: Start backend ./run_backend.sh # Terminal 2: Start frontend ./run_frontend.sh # Or just test Gradio python app.pyDeploy to production:
# Upload vector store to HF Hub python scripts/build_vector_store.py \ --input-dir ./data/guidelines \ --output-dir ./data/vector_store \ --upload --repo-id sniro23/VedaMD-Vector-Store # Push code to HF Spaces git add src/ app.py requirements.txt git commit -m "Update: Add new guidelines" git push origin main
Production Deployment
Backend (Hugging Face Spaces):
- Gradio interface: Automatic from
app.py - FastAPI API: Runs on port 7862
- Vector store: Downloaded from HF Hub on startup
- Secrets: Set in HF Spaces settings
Frontend (Netlify):
- Build:
cd frontend && npm run build - Deploy: Automatic from GitHub
- Environment:
NEXT_PUBLIC_API_URL=https://sniro23-vedamd-enhanced.hf.space
Migration Notes
From Old Structure
Moved:
Obs/*.pdfβdata/guidelines/*.pdf- Vector store logic remains in
src/
Archived:
batch_ocr_pipeline.pyβarchive/old_scripts/convert_pdf.pyβarchive/old_scripts/output*.mdβarchive/old_docs/cleanup_plan.mdβarchive/old_docs/
Created New:
scripts/- Automation scriptsdata/- Data directory structuredocs/- Documentation indexarchive/- Old files
Key Improvements
Before Cleanup
SL Clinical Assistant/
βββ app.py
βββ src/
βββ Obs/ # Unclear name
βββ batch_ocr_pipeline.py # Old script at root
βββ convert_pdf.py # Old script at root
βββ output.md # Temporary file
βββ output_new.md # Temporary file
βββ 15+ .md files at root # Disorganized docs
After Cleanup
SL Clinical Assistant/
βββ app.py # Clear entry point
βββ src/ # Core code
βββ scripts/ # Automation scripts
βββ data/ # Data files
β βββ guidelines/ # Clear purpose
β βββ vector_store/ # Clear purpose
βββ docs/ # Documentation index
βββ archive/ # Old files preserved
βββ Documentation files # Organized at root
Best Practices
Code Organization
- Core Logic: Keep in
src/ - Automation: Keep in
scripts/ - Data: Keep in
data/(gitignored) - Tests: Keep in
tests/(if created)
Documentation
- User Guides: Root level (PIPELINE_GUIDE.md, etc.)
- Technical Docs: Root level (DEPLOYMENT.md, etc.)
- Code Docs: Inline docstrings in Python files
- Index:
docs/README.mdfor navigation
Data Management
- Source Data:
data/guidelines/ - Processed Data:
data/vector_store/ - Backups: Automatic in
data/vector_store/backups/ - Test Data:
test_pdfs/,test_vector_store/
Version Control
- Commit Code:
src/,scripts/,app.py - Ignore Data:
data/,archive/,test_*/ - Commit Docs: All
.mdfiles - Templates:
.env.example, not.env
Quick Reference
Common Commands
# Build vector store from scratch
python scripts/build_vector_store.py --input-dir ./data/guidelines --output-dir ./data/vector_store
# Add single document
python scripts/add_document.py --file new.pdf --vector-store-dir ./data/vector_store
# Start backend
./run_backend.sh
# Start frontend
./run_frontend.sh
# Test Gradio interface
python app.py
# Upload to HF Hub
python scripts/build_vector_store.py ... --upload --repo-id sniro23/VedaMD-Vector-Store
Important Paths
- PDFs:
data/guidelines/ - Vector Store:
data/vector_store/ - RAG System:
src/enhanced_groq_medical_rag.py - API:
src/enhanced_backend_api.py - Scripts:
scripts/ - Docs: Root level +
docs/README.md
Clean codebase = Maintainable codebase = Production-ready codebase