Spaces:
Runtime error
Runtime error
File size: 3,242 Bytes
8c0b652 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
---
title: LinguaCustodia Financial AI API
emoji: π¦
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
app_port: 7860
---
# LinguaCustodia Financial AI API
A production-ready FastAPI application for financial AI inference using LinguaCustodia models.
## Features
- **Multiple Models**: Support for Llama 3.1, Qwen 3, Gemma 3, and Fin-Pythia models
- **FastAPI**: High-performance API with automatic documentation
- **Persistent Storage**: Models cached for faster restarts
- **GPU Support**: Automatic GPU detection and optimization
- **Health Monitoring**: Built-in health checks and diagnostics
## API Endpoints
- `GET /` - API information and status
- `GET /health` - Health check with model and GPU status
- `GET /models` - List available models and configurations
- `POST /inference` - Run inference with the loaded model
- `GET /docs` - Interactive API documentation
- `GET /diagnose-imports` - Diagnose import issues
## Usage
### Inference Request
```bash
curl -X POST "https://huggingface.co/spaces/jeanbaptdzd/linguacustodia-financial-api/inference" \
-H "Content-Type: application/json" \
-d '{
"prompt": "What is SFCR in insurance regulation?",
"max_new_tokens": 150,
"temperature": 0.6
}'
```
### Response
```json
{
"response": "SFCR (Solvency and Financial Condition Report) is a regulatory requirement...",
"model_used": "LinguaCustodia/llama3.1-8b-fin-v0.3",
"success": true,
"tokens_generated": 45,
"generation_params": {
"max_new_tokens": 150,
"temperature": 0.6,
"eos_token_id": [128001, 128008, 128009],
"early_stopping": false,
"min_length": 50
}
}
```
## Environment Variables
The following environment variables need to be set in the Space settings:
- `HF_TOKEN_LC`: HuggingFace token for LinguaCustodia models (required)
- `MODEL_NAME`: Model to use (default: "llama3.1-8b")
- `APP_PORT`: Application port (default: 7860)
## Models Available
### β
**L40 GPU Compatible Models**
- **llama3.1-8b**: Llama 3.1 8B Financial (16GB RAM, 8GB VRAM) - β
**Recommended**
- **qwen3-8b**: Qwen 3 8B Financial (16GB RAM, 8GB VRAM) - β
**Recommended**
- **fin-pythia-1.4b**: Fin-Pythia 1.4B Financial (3GB RAM, 2GB VRAM) - β
Works
### β **L40 GPU Incompatible Models**
- **gemma3-12b**: Gemma 3 12B Financial (32GB RAM, 12GB VRAM) - β **Too large for L40**
- **llama3.1-70b**: Llama 3.1 70B Financial (140GB RAM, 80GB VRAM) - β **Too large for L40**
**β οΈ Important**: Gemma 3 12B and Llama 3.1 70B models are too large for L40 GPU (48GB VRAM) with vLLM. They will fail during KV cache initialization. Use 8B models for optimal performance.
## Architecture
This API uses a hybrid architecture that works in both local development and cloud deployment environments:
- **Clean Architecture**: Uses Pydantic models and proper separation of concerns
- **Embedded Fallback**: Falls back to embedded configuration when imports fail
- **Persistent Storage**: Models are cached in persistent storage for faster restarts
- **GPU Optimization**: Automatic GPU detection and memory management
## Development
For local development, see the main [README.md](README.md) file.
## License
MIT License - see LICENSE file for details.
|