# Context Length Testing for LinguaCustodia v1.0 Models ## Summary I made changes to the context length configurations for LinguaCustodia v1.0 models based on assumptions about the base models. However, these assumptions need to be verified by testing the actual model configurations. ## Changes Made ### Current Configuration (Needs Verification): - **Llama 3.1 8B**: 128K context ✅ (assumed based on Llama 3.1 specs) - **Llama 3.1 70B**: 128K context ✅ (assumed based on Llama 3.1 specs) - **Qwen 3 8B**: 32K context ❓ (assumed, needs verification) - **Qwen 3 32B**: 32K context ❓ (assumed, needs verification) - **Gemma 3 12B**: 8K context ❓ (assumed, needs verification) ### Files Modified: 1. **`app_config.py`**: Added `model_max_length` to tokenizer configs 2. **`app.py`**: - Updated `get_vllm_config_for_model()` with model-specific context length logic - Added `/test/model-configs` endpoint to test actual configurations 3. **`scaleway_deployment.py`**: Updated environment variables for each model size ## Testing Plan ### Phase 1: Verify Actual Context Lengths **Option A: Using HuggingFace Space (Recommended)** 1. Deploy updated app to HuggingFace Space 2. Call the `/test/model-configs` endpoint 3. Compare actual vs expected context lengths **Option B: Using Test Scripts** 1. Run `test_lingua_models.py` on a cloud platform (HF or Scaleway) 2. Review results to verify actual context lengths ### Phase 2: Deploy and Test **HuggingFace Space:** ```bash # The app.py now has /test/model-configs endpoint # Once deployed, test with: bash test_hf_endpoint.sh # Or manually: curl https://jeanbaptdzd-linguacustodia-financial-api.hf.space/test/model-configs | python3 -m json.tool ``` **Scaleway:** ```bash # Deploy with the updated configurations python scaleway_deployment.py # Test the endpoint curl https://your-scaleway-endpoint.com/test/model-configs ``` ## Next Steps 1. ✅ Added test endpoint to `app.py` 2. ✅ Created test scripts 3. ⏳ Deploy to HuggingFace Space 4. ⏳ Test the `/test/model-configs` endpoint 5. ⏳ Verify actual context lengths 6. ⏳ Fix any incorrect configurations 7. ⏳ Deploy to Scaleway for production testing ## Expected Results The `/test/model-configs` endpoint should return: ```json { "test_results": { "LinguaCustodia/llama3.1-8b-fin-v1.0": { "context_length": ACTUAL_VALUE, "model_type": "llama", "architectures": ["LlamaForCausalLM"], "config_available": true }, ... }, "expected_contexts": { "LinguaCustodia/llama3.1-8b-fin-v1.0": 128000, "LinguaCustodia/qwen3-8b-fin-v1.0": 32768, "LinguaCustodia/qwen3-32b-fin-v1.0": 32768, "LinguaCustodia/llama3.1-70b-fin-v1.0": 128000, "LinguaCustodia/gemma3-12b-fin-v1.0": 8192 } } ``` ## Important Note **Cloud-Only Testing**: Per project rules, local testing is not possible (local machine is weak). All testing must be done on: - HuggingFace Spaces (L40 GPU) - Scaleway (L40S/A100/H100 GPUs) ## Files to Deploy **Essential files for HuggingFace:** - `app.py` (with test endpoint) - `Dockerfile` - `requirements.txt` or `requirements-hf.txt` - `.env` with `HF_TOKEN_LC` **Essential files for Scaleway:** - `app.py` - `scaleway_deployment.py` - `Dockerfile.scaleway` - `.env` with Scaleway credentials and `HF_TOKEN_LC`