Spaces:
Runtime error
Runtime error
ποΈ LinguaCustodia API Architecture
π Overview
This document describes the clean, scalable architecture for the LinguaCustodia Financial AI API, designed to support multiple models and inference providers (HuggingFace, Scaleway, Koyeb).
π― Design Principles
- Configuration Pattern: Centralized configuration management
- Provider Abstraction: Support multiple inference providers
- Model Registry: Easy model switching and management
- Separation of Concerns: Clear module boundaries
- Solid Logging: Structured, contextual logging
- Testability: Easy to test and maintain
π Project Structure
LLM-Pro-Fin-Inference/
βββ config/ # Configuration module
β βββ __init__.py # Exports all configs
β βββ base_config.py # Base application config
β βββ model_configs.py # Model-specific configs
β βββ provider_configs.py # Provider-specific configs
β βββ logging_config.py # Logging setup
β
βββ core/ # Core business logic
β βββ __init__.py
β βββ storage_manager.py # Storage abstraction
β βββ model_loader.py # Model loading abstraction
β βββ inference_engine.py # Inference abstraction
β
βββ providers/ # Provider implementations
β βββ __init__.py
β βββ base_provider.py # Abstract base class
β βββ huggingface_provider.py # HF implementation
β βββ scaleway_provider.py # Scaleway implementation
β βββ koyeb_provider.py # Koyeb implementation
β
βββ api/ # API layer
β βββ __init__.py
β βββ app.py # FastAPI application
β βββ routes.py # API routes
β βββ models.py # Pydantic models
β
βββ utils/ # Utilities
β βββ __init__.py
β βββ helpers.py # Helper functions
β
βββ tests/ # Tests (keep existing)
β βββ test_api.py
β βββ test_model_loading.py
β βββ ...
β
βββ docs/ # Documentation
β βββ ARCHITECTURE.md # This file
β βββ API_REFERENCE.md # API documentation
β βββ DEPLOYMENT.md # Deployment guide
β
βββ app.py # Main entry point
βββ requirements.txt # Dependencies
βββ .env.example # Environment template
βββ README.md # Project overview
π§ Configuration Pattern
Base Configuration (config/base_config.py)
Purpose: Provides foundational settings and defaults for the entire application.
Features:
- API settings (host, port, CORS)
- Storage configuration
- Logging configuration
- Environment variable loading
- Provider selection
Usage:
from config import BaseConfig
config = BaseConfig.from_env()
print(config.to_dict())
Model Configurations (config/model_configs.py)
Purpose: Defines model-specific parameters and generation settings.
Features:
- Model registry for all LinguaCustodia models
- Generation configurations per model
- Memory requirements
- Hardware recommendations
Usage:
from config import get_model_config, list_available_models
# List available models
models = list_available_models() # ['llama3.1-8b', 'qwen3-8b', ...]
# Get specific model config
config = get_model_config('llama3.1-8b')
print(config.generation_config.temperature)
Provider Configurations (config/provider_configs.py)
Purpose: Defines provider-specific settings for different inference platforms.
Features:
- Provider registry (HuggingFace, Scaleway, Koyeb)
- API endpoints and authentication
- Provider capabilities (streaming, batching)
- Rate limiting and timeouts
Usage:
from config import get_provider_config
provider = get_provider_config('huggingface')
print(provider.api_endpoint)
Logging Configuration (config/logging_config.py)
Purpose: Provides structured, contextual logging.
Features:
- Colored console output
- JSON structured logs
- File rotation
- Context managers for extra fields
- Multiple log levels
Usage:
from config import setup_logging, get_logger, LogContext
# Setup logging (once at startup)
setup_logging(log_level="INFO", log_to_file=True)
# Get logger in any module
logger = get_logger(__name__)
logger.info("Starting application")
# Add context to logs
with LogContext(logger, user_id="123", request_id="abc"):
logger.info("Processing request")
π¨ Benefits of This Architecture
1. Multi-Provider Support
- Easy to switch between HuggingFace, Scaleway, Koyeb
- Consistent interface across providers
- Provider-specific optimizations
2. Model Flexibility
- Easy to add new models
- Centralized model configurations
- Model-specific generation parameters
3. Maintainability
- Clear separation of concerns
- Small, focused modules
- Easy to test and debug
4. Scalability
- Provider abstraction allows horizontal scaling
- Configuration-driven behavior
- Easy to add new features
5. Production-Ready
- Proper logging and monitoring
- Error handling and retries
- Configuration management
π¦ Files to Keep
Core Application Files
β
app.py # Main entry point
β
requirements.txt # Dependencies
β
.env.example # Environment template
β
README.md # Project documentation
β
Dockerfile # Docker configuration
Test Files (All in tests/ directory)
β
test_api.py
β
test_model_loading.py
β
test_private_access.py
β
comprehensive_test.py
β
test_response_quality.py
Documentation Files
β
PROJECT_RULES.md
β
MODEL_PARAMETERS_GUIDE.md
β
PERSISTENT_STORAGE_SETUP.md
β
DOCKER_SPACE_DEPLOYMENT.md
ποΈ Files to Remove
Redundant/Old Implementation Files
β space_app.py # Old Space app
β space_app_with_storage.py # Old storage app
β persistent_storage_app.py # Old storage app
β memory_efficient_app.py # Old optimized app
β respectful_linguacustodia_config.py # Old config
β storage_enabled_respectful_app.py # Refactored version
β app_refactored.py # Intermediate refactor
Test Files to Organize/Remove
β test_app_locally.py # Move to tests/
β test_fallback_locally.py # Move to tests/
β test_storage_detection.py # Move to tests/
β test_storage_setup.py # Move to tests/
β test_private_endpoint.py # Move to tests/
Investigation/Temporary Files
β investigate_model_configs.py # One-time investigation
β evaluate_remote_models.py # Development script
β verify_*.py # All verification scripts
Analysis/Documentation (Archive)
β LINGUACUSTODIA_INFERENCE_ANALYSIS.md # Archive to docs/archive/
π Migration Plan
Phase 1: Configuration Layer β
- Create config module structure
- Implement base config
- Implement model configs
- Implement provider configs
- Implement logging config
Phase 2: Core Layer (Next)
- Implement StorageManager
- Implement ModelLoader
- Implement InferenceEngine
Phase 3: Provider Layer
- Implement BaseProvider
- Implement HuggingFaceProvider
- Implement ScalewayProvider (stub)
- Implement KoyebProvider (stub)
Phase 4: API Layer
- Refactor FastAPI app
- Implement routes module
- Update Pydantic models
Phase 5: Cleanup
- Move test files to tests/
- Remove redundant files
- Update documentation
- Update deployment configs
π Usage Examples
Example 1: Basic Usage
from config import BaseConfig, get_model_config, setup_logging
from core import StorageManager, ModelLoader, InferenceEngine
# Setup
config = BaseConfig.from_env()
setup_logging(config.log_level)
model_config = get_model_config('llama3.1-8b')
# Initialize
storage = StorageManager(config)
loader = ModelLoader(config, model_config)
engine = InferenceEngine(loader)
# Inference
result = engine.generate("What is SFCR?", max_tokens=150)
print(result)
Example 2: Provider Switching
from config import BaseConfig, ProviderType
# HuggingFace (local)
config = BaseConfig(provider=ProviderType.HUGGINGFACE)
# Scaleway (cloud)
config = BaseConfig(provider=ProviderType.SCALEWAY)
# Koyeb (cloud)
config = BaseConfig(provider=ProviderType.KOYEB)
Example 3: Model Switching
from config import get_model_config
# Load different models
llama_config = get_model_config('llama3.1-8b')
qwen_config = get_model_config('qwen3-8b')
gemma_config = get_model_config('gemma3-12b')
π― Next Steps
- Review this architecture - Ensure it meets your needs
- Implement core layer - StorageManager, ModelLoader, InferenceEngine
- Implement provider layer - Start with HuggingFaceProvider
- Refactor API layer - Update FastAPI app
- Clean up files - Remove redundant files
- Update tests - Test new architecture
- Deploy - Test in production
π Questions?
This architecture provides:
- β Configuration pattern for flexibility
- β Multi-provider support (HF, Scaleway, Koyeb)
- β Solid logging implementation
- β Clean, maintainable code structure
- β Easy to extend and test
Ready to proceed with Phase 2 (Core Layer)?