dragonllm-finance-models / response_correctness_analysis.md
jeanbaptdzd's picture
feat: Clean deployment to HuggingFace Space with model config test endpoint
8c0b652

Response Correctness Analysis - Dragon-fin Performance Test

πŸ“Š Overall Assessment

Test Date: October 6, 2025
Model: LinguaCustodia/qwen3-8b-fin-v0.3
Total Queries: 20
Success Rate: 100% (all queries responded)


βœ… CORRECT RESPONSES

Financial Definitions (Excellent)

  1. EBITDA βœ… CORRECT

    • Definition: "Earnings Before Interest, Taxes, Depreciation, and Amortization" βœ…
    • Explanation: Accurate description of operating performance metric βœ…
    • Example: $100M revenue - $50M COGS - $20M SG&A = $30M EBITDA βœ…
    • Quality: Professional, accurate, well-structured
  2. P/E Ratio βœ… CORRECT

    • Definition: "Price-to-earnings ratio" βœ…
    • Calculation: "Market price per share Γ· earnings per share" βœ…
    • Interpretation: High P/E = expensive, Low P/E = cheap (with caveats) βœ…
    • Quality: Comprehensive, includes limitations and context
  3. Derivatives βœ… CORRECT

    • Definition: "Financial instrument whose value is derived from underlying asset" βœ…
    • Types: Options, futures, swaps βœ…
    • Uses: Hedging, speculation, leverage βœ…
    • Quality: Accurate, includes practical examples
  4. Market Capitalization βœ… CORRECT

    • Definition: "Total value of outstanding shares" βœ…
    • Calculation: "Stock price Γ— shares outstanding" βœ…
    • Categories: Small-cap ($300M-$2B), Mid-cap ($2B-$10B), Large-cap (>$10B) βœ…
    • Quality: Accurate ranges, good risk analysis

Complex Financial Analysis (Very Good)

  1. Debt vs Equity Financing βœ… CORRECT

    • Debt advantages: Control retention, tax benefits, lower cost βœ…
    • Debt disadvantages: Fixed obligations, leverage risk, covenants βœ…
    • Equity advantages: No repayment, reduced risk, expertise access βœ…
    • Equity disadvantages: Dilution, loss of control, pressure βœ…
    • Quality: Balanced, comprehensive comparison
  2. Interest Rate Impact on Bonds βœ… CORRECT

    • Government bonds: Less sensitive, inverse relationship βœ…
    • Corporate bonds: More sensitive, credit risk amplification βœ…
    • Zero-coupon bonds: Highest sensitivity βœ…
    • Quality: Technically accurate, well-structured
  3. Square Root of 144 βœ… CORRECT

    • Answer: 12 βœ…
    • Explanation: 12 Γ— 12 = 144 βœ…
    • Additional info: Mentions -12 as also valid βœ…
    • Quality: Mathematically correct, educational

❌ INCORRECT RESPONSES

Critical Error

  1. "What is 2+2?" ❌ WRONG
    • Response: "-1"
    • Correct Answer: "4"
    • Severity: Critical - basic arithmetic failure
    • Impact: Raises concerns about fundamental math capabilities

Overly Complex Response

  1. "Calculate 15 * 8" ⚠️ CORRECT BUT OVERCOMPLICATED
    • Response: Detailed step-by-step explanation ending with "15 * 8 equals 120"
    • Correct Answer: 120 βœ…
    • Issue: Extremely verbose for simple multiplication
    • Quality: Correct but inefficient

πŸ“ˆ Response Quality Analysis

Strengths

  • Financial Expertise: Excellent knowledge of financial concepts
  • Comprehensive: Detailed explanations with examples
  • Professional Tone: Appropriate for financial professionals
  • Structured: Well-organized responses with clear sections
  • Context-Aware: Includes limitations and caveats

Weaknesses

  • Basic Math Issues: Failed simple arithmetic (2+2 = -1)
  • Over-Engineering: Simple questions get overly complex responses
  • Inconsistent: Complex financial analysis is excellent, basic math is poor

🎯 Category Performance

Category Accuracy Quality Notes
Finance 100% Excellent Professional-grade responses
Analysis 100% Very Good Comprehensive, accurate
Regulatory 100% Good Technically correct
Markets 100% Good Accurate market concepts
Risk 100% Good Proper risk terminology
Math 33% Poor 1/3 correct, basic arithmetic failure

πŸ” Detailed Findings

Financial Domain Excellence

The model demonstrates exceptional performance in financial domains:

  • Accurate definitions and calculations
  • Professional terminology usage
  • Comprehensive analysis with practical examples
  • Proper understanding of market dynamics

Mathematical Inconsistency

Critical concern: The model fails basic arithmetic while excelling at complex financial mathematics. This suggests:

  • Possible training data issues with simple math
  • Model may be over-optimized for financial content
  • Potential prompt sensitivity issues

Response Patterns

  • Consistent Length: 150-200 tokens for complex questions
  • Professional Structure: Well-formatted with bullet points and examples
  • Educational Approach: Often includes additional context and explanations

🚨 Recommendations

Immediate Actions

  1. Investigate Math Issue: Test more basic arithmetic problems
  2. Prompt Engineering: Try different phrasings for simple questions
  3. Model Validation: Verify if this is a systematic issue

Quality Improvements

  1. Response Length: Implement length controls for simple questions
  2. Accuracy Monitoring: Add basic math validation tests
  3. Domain Balancing: Ensure model handles both simple and complex queries well

πŸ“Š Overall Score

Financial Domain: 95/100 (Excellent)
Mathematical Domain: 40/100 (Poor)
Overall Accuracy: 85/100 (Good with concerns)

Recommendation: Model is production-ready for financial analysis but requires investigation of basic math capabilities.