dragonllm-finance-models / docs /ARCHITECTURE.md
jeanbaptdzd's picture
feat: Clean deployment to HuggingFace Space with model config test endpoint
8c0b652

πŸ—οΈ LinguaCustodia API Architecture

πŸ“‹ Overview

This document describes the clean, scalable architecture for the LinguaCustodia Financial AI API, designed to support multiple models and inference providers (HuggingFace, Scaleway, Koyeb).

🎯 Design Principles

  1. Configuration Pattern: Centralized configuration management
  2. Provider Abstraction: Support multiple inference providers
  3. Model Registry: Easy model switching and management
  4. Separation of Concerns: Clear module boundaries
  5. Solid Logging: Structured, contextual logging
  6. Testability: Easy to test and maintain

πŸ“ Project Structure

LLM-Pro-Fin-Inference/
β”œβ”€β”€ config/                      # Configuration module
β”‚   β”œβ”€β”€ __init__.py             # Exports all configs
β”‚   β”œβ”€β”€ base_config.py          # Base application config
β”‚   β”œβ”€β”€ model_configs.py        # Model-specific configs
β”‚   β”œβ”€β”€ provider_configs.py     # Provider-specific configs
β”‚   └── logging_config.py       # Logging setup
β”‚
β”œβ”€β”€ core/                        # Core business logic
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ storage_manager.py      # Storage abstraction
β”‚   β”œβ”€β”€ model_loader.py         # Model loading abstraction
β”‚   └── inference_engine.py     # Inference abstraction
β”‚
β”œβ”€β”€ providers/                   # Provider implementations
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ base_provider.py        # Abstract base class
β”‚   β”œβ”€β”€ huggingface_provider.py # HF implementation
β”‚   β”œβ”€β”€ scaleway_provider.py    # Scaleway implementation
β”‚   └── koyeb_provider.py       # Koyeb implementation
β”‚
β”œβ”€β”€ api/                         # API layer
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ app.py                  # FastAPI application
β”‚   β”œβ”€β”€ routes.py               # API routes
β”‚   └── models.py               # Pydantic models
β”‚
β”œβ”€β”€ utils/                       # Utilities
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── helpers.py              # Helper functions
β”‚
β”œβ”€β”€ tests/                       # Tests (keep existing)
β”‚   β”œβ”€β”€ test_api.py
β”‚   β”œβ”€β”€ test_model_loading.py
β”‚   └── ...
β”‚
β”œβ”€β”€ docs/                        # Documentation
β”‚   β”œβ”€β”€ ARCHITECTURE.md         # This file
β”‚   β”œβ”€β”€ API_REFERENCE.md        # API documentation
β”‚   └── DEPLOYMENT.md           # Deployment guide
β”‚
β”œβ”€β”€ app.py                       # Main entry point
β”œβ”€β”€ requirements.txt             # Dependencies
β”œβ”€β”€ .env.example                 # Environment template
└── README.md                    # Project overview

πŸ”§ Configuration Pattern

Base Configuration (config/base_config.py)

Purpose: Provides foundational settings and defaults for the entire application.

Features:

  • API settings (host, port, CORS)
  • Storage configuration
  • Logging configuration
  • Environment variable loading
  • Provider selection

Usage:

from config import BaseConfig

config = BaseConfig.from_env()
print(config.to_dict())

Model Configurations (config/model_configs.py)

Purpose: Defines model-specific parameters and generation settings.

Features:

  • Model registry for all LinguaCustodia models
  • Generation configurations per model
  • Memory requirements
  • Hardware recommendations

Usage:

from config import get_model_config, list_available_models

# List available models
models = list_available_models()  # ['llama3.1-8b', 'qwen3-8b', ...]

# Get specific model config
config = get_model_config('llama3.1-8b')
print(config.generation_config.temperature)

Provider Configurations (config/provider_configs.py)

Purpose: Defines provider-specific settings for different inference platforms.

Features:

  • Provider registry (HuggingFace, Scaleway, Koyeb)
  • API endpoints and authentication
  • Provider capabilities (streaming, batching)
  • Rate limiting and timeouts

Usage:

from config import get_provider_config

provider = get_provider_config('huggingface')
print(provider.api_endpoint)

Logging Configuration (config/logging_config.py)

Purpose: Provides structured, contextual logging.

Features:

  • Colored console output
  • JSON structured logs
  • File rotation
  • Context managers for extra fields
  • Multiple log levels

Usage:

from config import setup_logging, get_logger, LogContext

# Setup logging (once at startup)
setup_logging(log_level="INFO", log_to_file=True)

# Get logger in any module
logger = get_logger(__name__)
logger.info("Starting application")

# Add context to logs
with LogContext(logger, user_id="123", request_id="abc"):
    logger.info("Processing request")

🎨 Benefits of This Architecture

1. Multi-Provider Support

  • Easy to switch between HuggingFace, Scaleway, Koyeb
  • Consistent interface across providers
  • Provider-specific optimizations

2. Model Flexibility

  • Easy to add new models
  • Centralized model configurations
  • Model-specific generation parameters

3. Maintainability

  • Clear separation of concerns
  • Small, focused modules
  • Easy to test and debug

4. Scalability

  • Provider abstraction allows horizontal scaling
  • Configuration-driven behavior
  • Easy to add new features

5. Production-Ready

  • Proper logging and monitoring
  • Error handling and retries
  • Configuration management

πŸ“¦ Files to Keep

Core Application Files

βœ… app.py                    # Main entry point
βœ… requirements.txt          # Dependencies
βœ… .env.example             # Environment template
βœ… README.md                # Project documentation
βœ… Dockerfile               # Docker configuration

Test Files (All in tests/ directory)

βœ… test_api.py
βœ… test_model_loading.py
βœ… test_private_access.py
βœ… comprehensive_test.py
βœ… test_response_quality.py

Documentation Files

βœ… PROJECT_RULES.md
βœ… MODEL_PARAMETERS_GUIDE.md
βœ… PERSISTENT_STORAGE_SETUP.md
βœ… DOCKER_SPACE_DEPLOYMENT.md

πŸ—‘οΈ Files to Remove

Redundant/Old Implementation Files

❌ space_app.py                    # Old Space app
❌ space_app_with_storage.py       # Old storage app
❌ persistent_storage_app.py       # Old storage app
❌ memory_efficient_app.py         # Old optimized app
❌ respectful_linguacustodia_config.py  # Old config
❌ storage_enabled_respectful_app.py    # Refactored version
❌ app_refactored.py               # Intermediate refactor

Test Files to Organize/Remove

❌ test_app_locally.py            # Move to tests/
❌ test_fallback_locally.py       # Move to tests/
❌ test_storage_detection.py      # Move to tests/
❌ test_storage_setup.py          # Move to tests/
❌ test_private_endpoint.py       # Move to tests/

Investigation/Temporary Files

❌ investigate_model_configs.py   # One-time investigation
❌ evaluate_remote_models.py      # Development script
❌ verify_*.py                    # All verification scripts

Analysis/Documentation (Archive)

❌ LINGUACUSTODIA_INFERENCE_ANALYSIS.md  # Archive to docs/archive/

πŸš€ Migration Plan

Phase 1: Configuration Layer βœ…

  • Create config module structure
  • Implement base config
  • Implement model configs
  • Implement provider configs
  • Implement logging config

Phase 2: Core Layer (Next)

  • Implement StorageManager
  • Implement ModelLoader
  • Implement InferenceEngine

Phase 3: Provider Layer

  • Implement BaseProvider
  • Implement HuggingFaceProvider
  • Implement ScalewayProvider (stub)
  • Implement KoyebProvider (stub)

Phase 4: API Layer

  • Refactor FastAPI app
  • Implement routes module
  • Update Pydantic models

Phase 5: Cleanup

  • Move test files to tests/
  • Remove redundant files
  • Update documentation
  • Update deployment configs

πŸ“ Usage Examples

Example 1: Basic Usage

from config import BaseConfig, get_model_config, setup_logging
from core import StorageManager, ModelLoader, InferenceEngine

# Setup
config = BaseConfig.from_env()
setup_logging(config.log_level)
model_config = get_model_config('llama3.1-8b')

# Initialize
storage = StorageManager(config)
loader = ModelLoader(config, model_config)
engine = InferenceEngine(loader)

# Inference
result = engine.generate("What is SFCR?", max_tokens=150)
print(result)

Example 2: Provider Switching

from config import BaseConfig, ProviderType

# HuggingFace (local)
config = BaseConfig(provider=ProviderType.HUGGINGFACE)

# Scaleway (cloud)
config = BaseConfig(provider=ProviderType.SCALEWAY)

# Koyeb (cloud)
config = BaseConfig(provider=ProviderType.KOYEB)

Example 3: Model Switching

from config import get_model_config

# Load different models
llama_config = get_model_config('llama3.1-8b')
qwen_config = get_model_config('qwen3-8b')
gemma_config = get_model_config('gemma3-12b')

🎯 Next Steps

  1. Review this architecture - Ensure it meets your needs
  2. Implement core layer - StorageManager, ModelLoader, InferenceEngine
  3. Implement provider layer - Start with HuggingFaceProvider
  4. Refactor API layer - Update FastAPI app
  5. Clean up files - Remove redundant files
  6. Update tests - Test new architecture
  7. Deploy - Test in production

πŸ“ž Questions?

This architecture provides:

  • βœ… Configuration pattern for flexibility
  • βœ… Multi-provider support (HF, Scaleway, Koyeb)
  • βœ… Solid logging implementation
  • βœ… Clean, maintainable code structure
  • βœ… Easy to extend and test

Ready to proceed with Phase 2 (Core Layer)?