# 🏗️ LinguaCustodia API Architecture

## 📋 Overview

This document describes the clean, scalable architecture for the LinguaCustodia Financial AI API, designed to support multiple models and inference providers (HuggingFace, Scaleway, Koyeb).

## 🎯 Design Principles

1. **Configuration Pattern**: Centralized configuration management
2. **Provider Abstraction**: Support multiple inference providers
3. **Model Registry**: Easy model switching and management
4. **Separation of Concerns**: Clear module boundaries
5. **Solid Logging**: Structured, contextual logging
6. **Testability**: Easy to test and maintain

## 📁 Project Structure

```
LLM-Pro-Fin-Inference/
├── config/                      # Configuration module
│   ├── __init__.py             # Exports all configs
│   ├── base_config.py          # Base application config
│   ├── model_configs.py        # Model-specific configs
│   ├── provider_configs.py     # Provider-specific configs
│   └── logging_config.py       # Logging setup
│
├── core/                        # Core business logic
│   ├── __init__.py
│   ├── storage_manager.py      # Storage abstraction
│   ├── model_loader.py         # Model loading abstraction
│   └── inference_engine.py     # Inference abstraction
│
├── providers/                   # Provider implementations
│   ├── __init__.py
│   ├── base_provider.py        # Abstract base class
│   ├── huggingface_provider.py # HF implementation
│   ├── scaleway_provider.py    # Scaleway implementation
│   └── koyeb_provider.py       # Koyeb implementation
│
├── api/                         # API layer
│   ├── __init__.py
│   ├── app.py                  # FastAPI application
│   ├── routes.py               # API routes
│   └── models.py               # Pydantic models
│
├── utils/                       # Utilities
│   ├── __init__.py
│   └── helpers.py              # Helper functions
│
├── tests/                       # Tests (keep existing)
│   ├── test_api.py
│   ├── test_model_loading.py
│   └── ...
│
├── docs/                        # Documentation
│   ├── ARCHITECTURE.md         # This file
│   ├── API_REFERENCE.md        # API documentation
│   └── DEPLOYMENT.md           # Deployment guide
│
├── app.py                       # Main entry point
├── requirements.txt             # Dependencies
├── .env.example                 # Environment template
└── README.md                    # Project overview
```

## 🔧 Configuration Pattern

### Base Configuration (`config/base_config.py`)

**Purpose**: Provides foundational settings and defaults for the entire application.

**Features**:
- API settings (host, port, CORS)
- Storage configuration
- Logging configuration
- Environment variable loading
- Provider selection

**Usage**:
```python
from config import BaseConfig

config = BaseConfig.from_env()
print(config.to_dict())
```

### Model Configurations (`config/model_configs.py`)

**Purpose**: Defines model-specific parameters and generation settings.

**Features**:
- Model registry for all LinguaCustodia models
- Generation configurations per model
- Memory requirements
- Hardware recommendations

**Usage**:
```python
from config import get_model_config, list_available_models

# List available models
models = list_available_models()  # ['llama3.1-8b', 'qwen3-8b', ...]

# Get specific model config
config = get_model_config('llama3.1-8b')
print(config.generation_config.temperature)
```

### Provider Configurations (`config/provider_configs.py`)

**Purpose**: Defines provider-specific settings for different inference platforms.

**Features**:
- Provider registry (HuggingFace, Scaleway, Koyeb)
- API endpoints and authentication
- Provider capabilities (streaming, batching)
- Rate limiting and timeouts

**Usage**:
```python
from config import get_provider_config

provider = get_provider_config('huggingface')
print(provider.api_endpoint)
```

### Logging Configuration (`config/logging_config.py`)

**Purpose**: Provides structured, contextual logging.

**Features**:
- Colored console output
- JSON structured logs
- File rotation
- Context managers for extra fields
- Multiple log levels

**Usage**:
```python
from config import setup_logging, get_logger, LogContext

# Setup logging (once at startup)
setup_logging(log_level="INFO", log_to_file=True)

# Get logger in any module
logger = get_logger(__name__)
logger.info("Starting application")

# Add context to logs
with LogContext(logger, user_id="123", request_id="abc"):
    logger.info("Processing request")
```

## 🎨 Benefits of This Architecture

### 1. **Multi-Provider Support**
- Easy to switch between HuggingFace, Scaleway, Koyeb
- Consistent interface across providers
- Provider-specific optimizations

### 2. **Model Flexibility**
- Easy to add new models
- Centralized model configurations
- Model-specific generation parameters

### 3. **Maintainability**
- Clear separation of concerns
- Small, focused modules
- Easy to test and debug

### 4. **Scalability**
- Provider abstraction allows horizontal scaling
- Configuration-driven behavior
- Easy to add new features

### 5. **Production-Ready**
- Proper logging and monitoring
- Error handling and retries
- Configuration management

## 📦 Files to Keep

### Core Application Files
```
✅ app.py                    # Main entry point
✅ requirements.txt          # Dependencies
✅ .env.example             # Environment template
✅ README.md                # Project documentation
✅ Dockerfile               # Docker configuration
```

### Test Files (All in tests/ directory)
```
✅ test_api.py
✅ test_model_loading.py
✅ test_private_access.py
✅ comprehensive_test.py
✅ test_response_quality.py
```

### Documentation Files
```
✅ PROJECT_RULES.md
✅ MODEL_PARAMETERS_GUIDE.md
✅ PERSISTENT_STORAGE_SETUP.md
✅ DOCKER_SPACE_DEPLOYMENT.md
```

## 🗑️ Files to Remove

### Redundant/Old Implementation Files
```
❌ space_app.py                    # Old Space app
❌ space_app_with_storage.py       # Old storage app
❌ persistent_storage_app.py       # Old storage app
❌ memory_efficient_app.py         # Old optimized app
❌ respectful_linguacustodia_config.py  # Old config
❌ storage_enabled_respectful_app.py    # Refactored version
❌ app_refactored.py               # Intermediate refactor
```

### Test Files to Organize/Remove
```
❌ test_app_locally.py            # Move to tests/
❌ test_fallback_locally.py       # Move to tests/
❌ test_storage_detection.py      # Move to tests/
❌ test_storage_setup.py          # Move to tests/
❌ test_private_endpoint.py       # Move to tests/
```

### Investigation/Temporary Files
```
❌ investigate_model_configs.py   # One-time investigation
❌ evaluate_remote_models.py      # Development script
❌ verify_*.py                    # All verification scripts
```

### Analysis/Documentation (Archive)
```
❌ LINGUACUSTODIA_INFERENCE_ANALYSIS.md  # Archive to docs/archive/
```

## 🚀 Migration Plan

### Phase 1: Configuration Layer ✅
- [x] Create config module structure
- [x] Implement base config
- [x] Implement model configs
- [x] Implement provider configs
- [x] Implement logging config

### Phase 2: Core Layer (Next)
- [ ] Implement StorageManager
- [ ] Implement ModelLoader
- [ ] Implement InferenceEngine

### Phase 3: Provider Layer
- [ ] Implement BaseProvider
- [ ] Implement HuggingFaceProvider
- [ ] Implement ScalewayProvider (stub)
- [ ] Implement KoyebProvider (stub)

### Phase 4: API Layer
- [ ] Refactor FastAPI app
- [ ] Implement routes module
- [ ] Update Pydantic models

### Phase 5: Cleanup
- [ ] Move test files to tests/
- [ ] Remove redundant files
- [ ] Update documentation
- [ ] Update deployment configs

## 📝 Usage Examples

### Example 1: Basic Usage
```python
from config import BaseConfig, get_model_config, setup_logging
from core import StorageManager, ModelLoader, InferenceEngine

# Setup
config = BaseConfig.from_env()
setup_logging(config.log_level)
model_config = get_model_config('llama3.1-8b')

# Initialize
storage = StorageManager(config)
loader = ModelLoader(config, model_config)
engine = InferenceEngine(loader)

# Inference
result = engine.generate("What is SFCR?", max_tokens=150)
print(result)
```

### Example 2: Provider Switching
```python
from config import BaseConfig, ProviderType

# HuggingFace (local)
config = BaseConfig(provider=ProviderType.HUGGINGFACE)

# Scaleway (cloud)
config = BaseConfig(provider=ProviderType.SCALEWAY)

# Koyeb (cloud)
config = BaseConfig(provider=ProviderType.KOYEB)
```

### Example 3: Model Switching
```python
from config import get_model_config

# Load different models
llama_config = get_model_config('llama3.1-8b')
qwen_config = get_model_config('qwen3-8b')
gemma_config = get_model_config('gemma3-12b')
```

## 🎯 Next Steps

1. **Review this architecture** - Ensure it meets your needs
2. **Implement core layer** - StorageManager, ModelLoader, InferenceEngine
3. **Implement provider layer** - Start with HuggingFaceProvider
4. **Refactor API layer** - Update FastAPI app
5. **Clean up files** - Remove redundant files
6. **Update tests** - Test new architecture
7. **Deploy** - Test in production

## 📞 Questions?

This architecture provides:
- ✅ Configuration pattern for flexibility
- ✅ Multi-provider support (HF, Scaleway, Koyeb)
- ✅ Solid logging implementation
- ✅ Clean, maintainable code structure
- ✅ Easy to extend and test

Ready to proceed with Phase 2 (Core Layer)?