Spaces:
Runtime error
Runtime error
metadata
title: LinguaCustodia Financial AI API
emoji: π¦
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
app_port: 7860
LinguaCustodia Financial AI API
A production-ready FastAPI application for financial AI inference using LinguaCustodia models.
Features
- Multiple Models: Support for Llama 3.1, Qwen 3, Gemma 3, and Fin-Pythia models
- FastAPI: High-performance API with automatic documentation
- Persistent Storage: Models cached for faster restarts
- GPU Support: Automatic GPU detection and optimization
- Health Monitoring: Built-in health checks and diagnostics
API Endpoints
GET /- API information and statusGET /health- Health check with model and GPU statusGET /models- List available models and configurationsPOST /inference- Run inference with the loaded modelGET /docs- Interactive API documentationGET /diagnose-imports- Diagnose import issues
Usage
Inference Request
curl -X POST "https://huggingface.co/spaces/jeanbaptdzd/linguacustodia-financial-api/inference" \
-H "Content-Type: application/json" \
-d '{
"prompt": "What is SFCR in insurance regulation?",
"max_new_tokens": 150,
"temperature": 0.6
}'
Response
{
"response": "SFCR (Solvency and Financial Condition Report) is a regulatory requirement...",
"model_used": "LinguaCustodia/llama3.1-8b-fin-v0.3",
"success": true,
"tokens_generated": 45,
"generation_params": {
"max_new_tokens": 150,
"temperature": 0.6,
"eos_token_id": [128001, 128008, 128009],
"early_stopping": false,
"min_length": 50
}
}
Environment Variables
The following environment variables need to be set in the Space settings:
HF_TOKEN_LC: HuggingFace token for LinguaCustodia models (required)MODEL_NAME: Model to use (default: "llama3.1-8b")APP_PORT: Application port (default: 7860)
Models Available
β L40 GPU Compatible Models
- llama3.1-8b: Llama 3.1 8B Financial (16GB RAM, 8GB VRAM) - β Recommended
- qwen3-8b: Qwen 3 8B Financial (16GB RAM, 8GB VRAM) - β Recommended
- fin-pythia-1.4b: Fin-Pythia 1.4B Financial (3GB RAM, 2GB VRAM) - β Works
β L40 GPU Incompatible Models
- gemma3-12b: Gemma 3 12B Financial (32GB RAM, 12GB VRAM) - β Too large for L40
- llama3.1-70b: Llama 3.1 70B Financial (140GB RAM, 80GB VRAM) - β Too large for L40
β οΈ Important: Gemma 3 12B and Llama 3.1 70B models are too large for L40 GPU (48GB VRAM) with vLLM. They will fail during KV cache initialization. Use 8B models for optimal performance.
Architecture
This API uses a hybrid architecture that works in both local development and cloud deployment environments:
- Clean Architecture: Uses Pydantic models and proper separation of concerns
- Embedded Fallback: Falls back to embedded configuration when imports fail
- Persistent Storage: Models are cached in persistent storage for faster restarts
- GPU Optimization: Automatic GPU detection and memory management
Development
For local development, see the main README.md file.
License
MIT License - see LICENSE file for details.