Spaces:
Runtime error
Runtime error
Context Length Testing for LinguaCustodia v1.0 Models
Summary
I made changes to the context length configurations for LinguaCustodia v1.0 models based on assumptions about the base models. However, these assumptions need to be verified by testing the actual model configurations.
Changes Made
Current Configuration (Needs Verification):
- Llama 3.1 8B: 128K context β (assumed based on Llama 3.1 specs)
- Llama 3.1 70B: 128K context β (assumed based on Llama 3.1 specs)
- Qwen 3 8B: 32K context β (assumed, needs verification)
- Qwen 3 32B: 32K context β (assumed, needs verification)
- Gemma 3 12B: 8K context β (assumed, needs verification)
Files Modified:
app_config.py: Addedmodel_max_lengthto tokenizer configsapp.py:- Updated
get_vllm_config_for_model()with model-specific context length logic - Added
/test/model-configsendpoint to test actual configurations
- Updated
scaleway_deployment.py: Updated environment variables for each model size
Testing Plan
Phase 1: Verify Actual Context Lengths
Option A: Using HuggingFace Space (Recommended)
- Deploy updated app to HuggingFace Space
- Call the
/test/model-configsendpoint - Compare actual vs expected context lengths
Option B: Using Test Scripts
- Run
test_lingua_models.pyon a cloud platform (HF or Scaleway) - Review results to verify actual context lengths
Phase 2: Deploy and Test
HuggingFace Space:
# The app.py now has /test/model-configs endpoint
# Once deployed, test with:
bash test_hf_endpoint.sh
# Or manually:
curl https://jeanbaptdzd-linguacustodia-financial-api.hf.space/test/model-configs | python3 -m json.tool
Scaleway:
# Deploy with the updated configurations
python scaleway_deployment.py
# Test the endpoint
curl https://your-scaleway-endpoint.com/test/model-configs
Next Steps
- β
Added test endpoint to
app.py - β Created test scripts
- β³ Deploy to HuggingFace Space
- β³ Test the
/test/model-configsendpoint - β³ Verify actual context lengths
- β³ Fix any incorrect configurations
- β³ Deploy to Scaleway for production testing
Expected Results
The /test/model-configs endpoint should return:
{
"test_results": {
"LinguaCustodia/llama3.1-8b-fin-v1.0": {
"context_length": ACTUAL_VALUE,
"model_type": "llama",
"architectures": ["LlamaForCausalLM"],
"config_available": true
},
...
},
"expected_contexts": {
"LinguaCustodia/llama3.1-8b-fin-v1.0": 128000,
"LinguaCustodia/qwen3-8b-fin-v1.0": 32768,
"LinguaCustodia/qwen3-32b-fin-v1.0": 32768,
"LinguaCustodia/llama3.1-70b-fin-v1.0": 128000,
"LinguaCustodia/gemma3-12b-fin-v1.0": 8192
}
}
Important Note
Cloud-Only Testing: Per project rules, local testing is not possible (local machine is weak). All testing must be done on:
- HuggingFace Spaces (L40 GPU)
- Scaleway (L40S/A100/H100 GPUs)
Files to Deploy
Essential files for HuggingFace:
app.py(with test endpoint)Dockerfilerequirements.txtorrequirements-hf.txt.envwithHF_TOKEN_LC
Essential files for Scaleway:
app.pyscaleway_deployment.pyDockerfile.scaleway.envwith Scaleway credentials andHF_TOKEN_LC