Spaces:

jeanbaptdzd
/

dragonllm-finance-models

Runtime error

App Files Files Community

dragonllm-finance-models / CONTEXT_LENGTH_TESTING.md

jeanbaptdzd

feat: Clean deployment to HuggingFace Space with model config test endpoint

8c0b652 about 1 month ago

preview code

raw

history blame contribute delete

3.33 kB

Context Length Testing for LinguaCustodia v1.0 Models

Summary

I made changes to the context length configurations for LinguaCustodia v1.0 models based on assumptions about the base models. However, these assumptions need to be verified by testing the actual model configurations.

Changes Made

Current Configuration (Needs Verification):

Llama 3.1 8B: 128K context ✅ (assumed based on Llama 3.1 specs)
Llama 3.1 70B: 128K context ✅ (assumed based on Llama 3.1 specs)
Qwen 3 8B: 32K context ❓ (assumed, needs verification)
Qwen 3 32B: 32K context ❓ (assumed, needs verification)
Gemma 3 12B: 8K context ❓ (assumed, needs verification)

Files Modified:

app_config.py: Added model_max_length to tokenizer configs
app.py:
- Updated get_vllm_config_for_model() with model-specific context length logic
- Added /test/model-configs endpoint to test actual configurations
scaleway_deployment.py: Updated environment variables for each model size

Testing Plan

Phase 1: Verify Actual Context Lengths

Option A: Using HuggingFace Space (Recommended)

Deploy updated app to HuggingFace Space
Call the /test/model-configs endpoint
Compare actual vs expected context lengths

Option B: Using Test Scripts

Run test_lingua_models.py on a cloud platform (HF or Scaleway)
Review results to verify actual context lengths

Phase 2: Deploy and Test

HuggingFace Space:

# The app.py now has /test/model-configs endpoint
# Once deployed, test with:
bash test_hf_endpoint.sh

# Or manually:
curl https://jeanbaptdzd-linguacustodia-financial-api.hf.space/test/model-configs | python3 -m json.tool

Scaleway:

# Deploy with the updated configurations
python scaleway_deployment.py

# Test the endpoint
curl https://your-scaleway-endpoint.com/test/model-configs

Next Steps

✅ Added test endpoint to app.py
✅ Created test scripts
⏳ Deploy to HuggingFace Space
⏳ Test the /test/model-configs endpoint
⏳ Verify actual context lengths
⏳ Fix any incorrect configurations
⏳ Deploy to Scaleway for production testing

Expected Results

The /test/model-configs endpoint should return:

{
  "test_results": {
    "LinguaCustodia/llama3.1-8b-fin-v1.0": {
      "context_length": ACTUAL_VALUE,
      "model_type": "llama",
      "architectures": ["LlamaForCausalLM"],
      "config_available": true
    },
    ...
  },
  "expected_contexts": {
    "LinguaCustodia/llama3.1-8b-fin-v1.0": 128000,
    "LinguaCustodia/qwen3-8b-fin-v1.0": 32768,
    "LinguaCustodia/qwen3-32b-fin-v1.0": 32768,
    "LinguaCustodia/llama3.1-70b-fin-v1.0": 128000,
    "LinguaCustodia/gemma3-12b-fin-v1.0": 8192
  }
}

Important Note

Cloud-Only Testing: Per project rules, local testing is not possible (local machine is weak). All testing must be done on:

HuggingFace Spaces (L40 GPU)
Scaleway (L40S/A100/H100 GPUs)

Files to Deploy

Essential files for HuggingFace:

app.py (with test endpoint)
Dockerfile
requirements.txt or requirements-hf.txt
.env with HF_TOKEN_LC

Essential files for Scaleway:

app.py
scaleway_deployment.py
Dockerfile.scaleway
.env with Scaleway credentials and HF_TOKEN_LC