dragonllm-finance-models / CONTEXT_LENGTH_TESTING.md
jeanbaptdzd's picture
feat: Clean deployment to HuggingFace Space with model config test endpoint
8c0b652

Context Length Testing for LinguaCustodia v1.0 Models

Summary

I made changes to the context length configurations for LinguaCustodia v1.0 models based on assumptions about the base models. However, these assumptions need to be verified by testing the actual model configurations.

Changes Made

Current Configuration (Needs Verification):

  • Llama 3.1 8B: 128K context βœ… (assumed based on Llama 3.1 specs)
  • Llama 3.1 70B: 128K context βœ… (assumed based on Llama 3.1 specs)
  • Qwen 3 8B: 32K context ❓ (assumed, needs verification)
  • Qwen 3 32B: 32K context ❓ (assumed, needs verification)
  • Gemma 3 12B: 8K context ❓ (assumed, needs verification)

Files Modified:

  1. app_config.py: Added model_max_length to tokenizer configs
  2. app.py:
    • Updated get_vllm_config_for_model() with model-specific context length logic
    • Added /test/model-configs endpoint to test actual configurations
  3. scaleway_deployment.py: Updated environment variables for each model size

Testing Plan

Phase 1: Verify Actual Context Lengths

Option A: Using HuggingFace Space (Recommended)

  1. Deploy updated app to HuggingFace Space
  2. Call the /test/model-configs endpoint
  3. Compare actual vs expected context lengths

Option B: Using Test Scripts

  1. Run test_lingua_models.py on a cloud platform (HF or Scaleway)
  2. Review results to verify actual context lengths

Phase 2: Deploy and Test

HuggingFace Space:

# The app.py now has /test/model-configs endpoint
# Once deployed, test with:
bash test_hf_endpoint.sh

# Or manually:
curl https://jeanbaptdzd-linguacustodia-financial-api.hf.space/test/model-configs | python3 -m json.tool

Scaleway:

# Deploy with the updated configurations
python scaleway_deployment.py

# Test the endpoint
curl https://your-scaleway-endpoint.com/test/model-configs

Next Steps

  1. βœ… Added test endpoint to app.py
  2. βœ… Created test scripts
  3. ⏳ Deploy to HuggingFace Space
  4. ⏳ Test the /test/model-configs endpoint
  5. ⏳ Verify actual context lengths
  6. ⏳ Fix any incorrect configurations
  7. ⏳ Deploy to Scaleway for production testing

Expected Results

The /test/model-configs endpoint should return:

{
  "test_results": {
    "LinguaCustodia/llama3.1-8b-fin-v1.0": {
      "context_length": ACTUAL_VALUE,
      "model_type": "llama",
      "architectures": ["LlamaForCausalLM"],
      "config_available": true
    },
    ...
  },
  "expected_contexts": {
    "LinguaCustodia/llama3.1-8b-fin-v1.0": 128000,
    "LinguaCustodia/qwen3-8b-fin-v1.0": 32768,
    "LinguaCustodia/qwen3-32b-fin-v1.0": 32768,
    "LinguaCustodia/llama3.1-70b-fin-v1.0": 128000,
    "LinguaCustodia/gemma3-12b-fin-v1.0": 8192
  }
}

Important Note

Cloud-Only Testing: Per project rules, local testing is not possible (local machine is weak). All testing must be done on:

  • HuggingFace Spaces (L40 GPU)
  • Scaleway (L40S/A100/H100 GPUs)

Files to Deploy

Essential files for HuggingFace:

  • app.py (with test endpoint)
  • Dockerfile
  • requirements.txt or requirements-hf.txt
  • .env with HF_TOKEN_LC

Essential files for Scaleway:

  • app.py
  • scaleway_deployment.py
  • Dockerfile.scaleway
  • .env with Scaleway credentials and HF_TOKEN_LC