# 🏗️ LinguaCustodia API Architecture ## 📋 Overview This document describes the clean, scalable architecture for the LinguaCustodia Financial AI API, designed to support multiple models and inference providers (HuggingFace, Scaleway, Koyeb). ## 🎯 Design Principles 1. **Configuration Pattern**: Centralized configuration management 2. **Provider Abstraction**: Support multiple inference providers 3. **Model Registry**: Easy model switching and management 4. **Separation of Concerns**: Clear module boundaries 5. **Solid Logging**: Structured, contextual logging 6. **Testability**: Easy to test and maintain ## 📁 Project Structure ``` LLM-Pro-Fin-Inference/ ├── config/ # Configuration module │ ├── __init__.py # Exports all configs │ ├── base_config.py # Base application config │ ├── model_configs.py # Model-specific configs │ ├── provider_configs.py # Provider-specific configs │ └── logging_config.py # Logging setup │ ├── core/ # Core business logic │ ├── __init__.py │ ├── storage_manager.py # Storage abstraction │ ├── model_loader.py # Model loading abstraction │ └── inference_engine.py # Inference abstraction │ ├── providers/ # Provider implementations │ ├── __init__.py │ ├── base_provider.py # Abstract base class │ ├── huggingface_provider.py # HF implementation │ ├── scaleway_provider.py # Scaleway implementation │ └── koyeb_provider.py # Koyeb implementation │ ├── api/ # API layer │ ├── __init__.py │ ├── app.py # FastAPI application │ ├── routes.py # API routes │ └── models.py # Pydantic models │ ├── utils/ # Utilities │ ├── __init__.py │ └── helpers.py # Helper functions │ ├── tests/ # Tests (keep existing) │ ├── test_api.py │ ├── test_model_loading.py │ └── ... │ ├── docs/ # Documentation │ ├── ARCHITECTURE.md # This file │ ├── API_REFERENCE.md # API documentation │ └── DEPLOYMENT.md # Deployment guide │ ├── app.py # Main entry point ├── requirements.txt # Dependencies ├── .env.example # Environment template └── README.md # Project overview ``` ## 🔧 Configuration Pattern ### Base Configuration (`config/base_config.py`) **Purpose**: Provides foundational settings and defaults for the entire application. **Features**: - API settings (host, port, CORS) - Storage configuration - Logging configuration - Environment variable loading - Provider selection **Usage**: ```python from config import BaseConfig config = BaseConfig.from_env() print(config.to_dict()) ``` ### Model Configurations (`config/model_configs.py`) **Purpose**: Defines model-specific parameters and generation settings. **Features**: - Model registry for all LinguaCustodia models - Generation configurations per model - Memory requirements - Hardware recommendations **Usage**: ```python from config import get_model_config, list_available_models # List available models models = list_available_models() # ['llama3.1-8b', 'qwen3-8b', ...] # Get specific model config config = get_model_config('llama3.1-8b') print(config.generation_config.temperature) ``` ### Provider Configurations (`config/provider_configs.py`) **Purpose**: Defines provider-specific settings for different inference platforms. **Features**: - Provider registry (HuggingFace, Scaleway, Koyeb) - API endpoints and authentication - Provider capabilities (streaming, batching) - Rate limiting and timeouts **Usage**: ```python from config import get_provider_config provider = get_provider_config('huggingface') print(provider.api_endpoint) ``` ### Logging Configuration (`config/logging_config.py`) **Purpose**: Provides structured, contextual logging. **Features**: - Colored console output - JSON structured logs - File rotation - Context managers for extra fields - Multiple log levels **Usage**: ```python from config import setup_logging, get_logger, LogContext # Setup logging (once at startup) setup_logging(log_level="INFO", log_to_file=True) # Get logger in any module logger = get_logger(__name__) logger.info("Starting application") # Add context to logs with LogContext(logger, user_id="123", request_id="abc"): logger.info("Processing request") ``` ## 🎨 Benefits of This Architecture ### 1. **Multi-Provider Support** - Easy to switch between HuggingFace, Scaleway, Koyeb - Consistent interface across providers - Provider-specific optimizations ### 2. **Model Flexibility** - Easy to add new models - Centralized model configurations - Model-specific generation parameters ### 3. **Maintainability** - Clear separation of concerns - Small, focused modules - Easy to test and debug ### 4. **Scalability** - Provider abstraction allows horizontal scaling - Configuration-driven behavior - Easy to add new features ### 5. **Production-Ready** - Proper logging and monitoring - Error handling and retries - Configuration management ## 📦 Files to Keep ### Core Application Files ``` ✅ app.py # Main entry point ✅ requirements.txt # Dependencies ✅ .env.example # Environment template ✅ README.md # Project documentation ✅ Dockerfile # Docker configuration ``` ### Test Files (All in tests/ directory) ``` ✅ test_api.py ✅ test_model_loading.py ✅ test_private_access.py ✅ comprehensive_test.py ✅ test_response_quality.py ``` ### Documentation Files ``` ✅ PROJECT_RULES.md ✅ MODEL_PARAMETERS_GUIDE.md ✅ PERSISTENT_STORAGE_SETUP.md ✅ DOCKER_SPACE_DEPLOYMENT.md ``` ## 🗑️ Files to Remove ### Redundant/Old Implementation Files ``` ❌ space_app.py # Old Space app ❌ space_app_with_storage.py # Old storage app ❌ persistent_storage_app.py # Old storage app ❌ memory_efficient_app.py # Old optimized app ❌ respectful_linguacustodia_config.py # Old config ❌ storage_enabled_respectful_app.py # Refactored version ❌ app_refactored.py # Intermediate refactor ``` ### Test Files to Organize/Remove ``` ❌ test_app_locally.py # Move to tests/ ❌ test_fallback_locally.py # Move to tests/ ❌ test_storage_detection.py # Move to tests/ ❌ test_storage_setup.py # Move to tests/ ❌ test_private_endpoint.py # Move to tests/ ``` ### Investigation/Temporary Files ``` ❌ investigate_model_configs.py # One-time investigation ❌ evaluate_remote_models.py # Development script ❌ verify_*.py # All verification scripts ``` ### Analysis/Documentation (Archive) ``` ❌ LINGUACUSTODIA_INFERENCE_ANALYSIS.md # Archive to docs/archive/ ``` ## 🚀 Migration Plan ### Phase 1: Configuration Layer ✅ - [x] Create config module structure - [x] Implement base config - [x] Implement model configs - [x] Implement provider configs - [x] Implement logging config ### Phase 2: Core Layer (Next) - [ ] Implement StorageManager - [ ] Implement ModelLoader - [ ] Implement InferenceEngine ### Phase 3: Provider Layer - [ ] Implement BaseProvider - [ ] Implement HuggingFaceProvider - [ ] Implement ScalewayProvider (stub) - [ ] Implement KoyebProvider (stub) ### Phase 4: API Layer - [ ] Refactor FastAPI app - [ ] Implement routes module - [ ] Update Pydantic models ### Phase 5: Cleanup - [ ] Move test files to tests/ - [ ] Remove redundant files - [ ] Update documentation - [ ] Update deployment configs ## 📝 Usage Examples ### Example 1: Basic Usage ```python from config import BaseConfig, get_model_config, setup_logging from core import StorageManager, ModelLoader, InferenceEngine # Setup config = BaseConfig.from_env() setup_logging(config.log_level) model_config = get_model_config('llama3.1-8b') # Initialize storage = StorageManager(config) loader = ModelLoader(config, model_config) engine = InferenceEngine(loader) # Inference result = engine.generate("What is SFCR?", max_tokens=150) print(result) ``` ### Example 2: Provider Switching ```python from config import BaseConfig, ProviderType # HuggingFace (local) config = BaseConfig(provider=ProviderType.HUGGINGFACE) # Scaleway (cloud) config = BaseConfig(provider=ProviderType.SCALEWAY) # Koyeb (cloud) config = BaseConfig(provider=ProviderType.KOYEB) ``` ### Example 3: Model Switching ```python from config import get_model_config # Load different models llama_config = get_model_config('llama3.1-8b') qwen_config = get_model_config('qwen3-8b') gemma_config = get_model_config('gemma3-12b') ``` ## 🎯 Next Steps 1. **Review this architecture** - Ensure it meets your needs 2. **Implement core layer** - StorageManager, ModelLoader, InferenceEngine 3. **Implement provider layer** - Start with HuggingFaceProvider 4. **Refactor API layer** - Update FastAPI app 5. **Clean up files** - Remove redundant files 6. **Update tests** - Test new architecture 7. **Deploy** - Test in production ## 📞 Questions? This architecture provides: - ✅ Configuration pattern for flexibility - ✅ Multi-provider support (HF, Scaleway, Koyeb) - ✅ Solid logging implementation - ✅ Clean, maintainable code structure - ✅ Easy to extend and test Ready to proceed with Phase 2 (Core Layer)?