Spaces:
Running
Running
Nora Petrova
commited on
Commit
·
d8ff169
1
Parent(s):
6833632
Add app files
Browse files- Dockerfile +19 -0
- leaderboard-app/.gitignore +41 -0
- leaderboard-app/README.md +113 -0
- leaderboard-app/app/favicon.ico +0 -0
- leaderboard-app/app/globals.css +29 -0
- leaderboard-app/app/layout.js +19 -0
- leaderboard-app/app/page.js +84 -0
- leaderboard-app/components/HeadToHeadComparison.jsx +1002 -0
- leaderboard-app/components/LLMComparisonDashboard.jsx +688 -0
- leaderboard-app/components/MetricsBreakdown.jsx +638 -0
- leaderboard-app/components/TaskDemographicAnalysis.jsx +1416 -0
- leaderboard-app/eslint.config.mjs +14 -0
- leaderboard-app/jsconfig.json +7 -0
- leaderboard-app/lib/utils.js +205 -0
- leaderboard-app/next.config.mjs +4 -0
- leaderboard-app/package-lock.json +0 -0
- leaderboard-app/package.json +24 -0
- leaderboard-app/postcss.config.mjs +5 -0
- leaderboard-app/public/llm_comparison_data.json +0 -0
- leaderboard-app/public/vercel.svg +1 -0
Dockerfile
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
FROM node:20.11.0-slim
|
| 2 |
+
|
| 3 |
+
WORKDIR /app
|
| 4 |
+
|
| 5 |
+
# Copy the rest of the application code
|
| 6 |
+
COPY --chown=user leaderboard-app/ ./
|
| 7 |
+
|
| 8 |
+
RUN npm install
|
| 9 |
+
|
| 10 |
+
# Build the app
|
| 11 |
+
RUN npm run build
|
| 12 |
+
|
| 13 |
+
# Expose the port the app will run on
|
| 14 |
+
# HF Spaces uses port 7860 by default
|
| 15 |
+
EXPOSE 7860
|
| 16 |
+
|
| 17 |
+
# Start the app with the correct port
|
| 18 |
+
ENV PORT=7860
|
| 19 |
+
CMD ["npm", "start"]
|
leaderboard-app/.gitignore
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
|
| 2 |
+
|
| 3 |
+
# dependencies
|
| 4 |
+
/node_modules
|
| 5 |
+
/.pnp
|
| 6 |
+
.pnp.*
|
| 7 |
+
.yarn/*
|
| 8 |
+
!.yarn/patches
|
| 9 |
+
!.yarn/plugins
|
| 10 |
+
!.yarn/releases
|
| 11 |
+
!.yarn/versions
|
| 12 |
+
|
| 13 |
+
# testing
|
| 14 |
+
/coverage
|
| 15 |
+
|
| 16 |
+
# next.js
|
| 17 |
+
/.next/
|
| 18 |
+
/out/
|
| 19 |
+
|
| 20 |
+
# production
|
| 21 |
+
/build
|
| 22 |
+
|
| 23 |
+
# misc
|
| 24 |
+
.DS_Store
|
| 25 |
+
*.pem
|
| 26 |
+
|
| 27 |
+
# debug
|
| 28 |
+
npm-debug.log*
|
| 29 |
+
yarn-debug.log*
|
| 30 |
+
yarn-error.log*
|
| 31 |
+
.pnpm-debug.log*
|
| 32 |
+
|
| 33 |
+
# env files (can opt-in for committing if needed)
|
| 34 |
+
.env*
|
| 35 |
+
|
| 36 |
+
# vercel
|
| 37 |
+
.vercel
|
| 38 |
+
|
| 39 |
+
# typescript
|
| 40 |
+
*.tsbuildinfo
|
| 41 |
+
next-env.d.ts
|
leaderboard-app/README.md
ADDED
|
@@ -0,0 +1,113 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# LLM Comparison Leaderboard
|
| 2 |
+
|
| 3 |
+
An interactive dashboard for comparing the performance of state-of-the-art large language models across various tasks and metrics.
|
| 4 |
+
|
| 5 |
+
## Features
|
| 6 |
+
|
| 7 |
+
- Overall model rankings with comprehensive scoring
|
| 8 |
+
- Task-specific performance analysis
|
| 9 |
+
- Metric breakdowns across different dimensions
|
| 10 |
+
- User satisfaction and experience metrics
|
| 11 |
+
- Interactive visualizations using Recharts
|
| 12 |
+
- Responsive design for all device sizes
|
| 13 |
+
|
| 14 |
+
## Getting Started
|
| 15 |
+
|
| 16 |
+
### Prerequisites
|
| 17 |
+
|
| 18 |
+
- Node.js 16.8 or later
|
| 19 |
+
- Python 3.8 or later (for data processing)
|
| 20 |
+
- Python packages: pandas, numpy
|
| 21 |
+
|
| 22 |
+
### Installation
|
| 23 |
+
|
| 24 |
+
1. Clone the repository:
|
| 25 |
+
|
| 26 |
+
```bash
|
| 27 |
+
git clone https://github.com/yourusername/llm-comparison-leaderboard.git
|
| 28 |
+
cd llm-comparison-leaderboard
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
2. Install dependencies:
|
| 32 |
+
|
| 33 |
+
```bash
|
| 34 |
+
npm install
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
3. Install Python dependencies (if you plan to process data):
|
| 38 |
+
|
| 39 |
+
```bash
|
| 40 |
+
pip install pandas numpy
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
### Using Sample Data
|
| 44 |
+
|
| 45 |
+
The repository includes a sample JSON file with placeholder data in `public/llm_comparison_data.json`. You can start the development server right away to see the dashboard with this data:
|
| 46 |
+
|
| 47 |
+
```bash
|
| 48 |
+
npm run dev
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
Visit [http://localhost:3000](http://localhost:3000) to see the dashboard.
|
| 52 |
+
|
| 53 |
+
### Processing Your Own Data
|
| 54 |
+
|
| 55 |
+
If you have your own data, follow these steps:
|
| 56 |
+
|
| 57 |
+
1. Place your CSV data file in the `data` directory:
|
| 58 |
+
|
| 59 |
+
```bash
|
| 60 |
+
mkdir -p data
|
| 61 |
+
cp /path/to/your/pilot_data_n20.csv data/
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
2. Run the data processing script:
|
| 65 |
+
|
| 66 |
+
```bash
|
| 67 |
+
npm run process-data
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
This will:
|
| 71 |
+
- Process the CSV data using the Python script
|
| 72 |
+
- Generate a JSON file in the `public` directory
|
| 73 |
+
- Format the data for the dashboard
|
| 74 |
+
|
| 75 |
+
3. Start the development server:
|
| 76 |
+
|
| 77 |
+
```bash
|
| 78 |
+
npm run dev
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
## Project Structure
|
| 82 |
+
|
| 83 |
+
- `app/` - Next.js App Router components
|
| 84 |
+
- `page.js` - Main page component that loads data and renders dashboard
|
| 85 |
+
- `layout.js` - Layout component with metadata and global styles
|
| 86 |
+
- `globals.css` - Global styles including Tailwind CSS
|
| 87 |
+
- `components/` - React components
|
| 88 |
+
- `LLMComparisonDashboard.jsx` - The main dashboard component
|
| 89 |
+
- `public/` - Static files
|
| 90 |
+
- `llm_comparison_data.json` - Processed data for the dashboard
|
| 91 |
+
- `lib/` - Utility functions
|
| 92 |
+
- `utils.js` - Helper functions for data processing
|
| 93 |
+
- `scripts/` - Data processing scripts
|
| 94 |
+
- `process_data.js` - Node.js script for running Python processor
|
| 95 |
+
- `process_data.py` - Python script for data processing
|
| 96 |
+
|
| 97 |
+
## Building for Production
|
| 98 |
+
|
| 99 |
+
To build the application for production:
|
| 100 |
+
|
| 101 |
+
```bash
|
| 102 |
+
npm run build
|
| 103 |
+
```
|
| 104 |
+
|
| 105 |
+
To start the production server:
|
| 106 |
+
|
| 107 |
+
```bash
|
| 108 |
+
npm run start
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
## License
|
| 112 |
+
|
| 113 |
+
This project is licensed under the MIT License - see the LICENSE file for details.
|
leaderboard-app/app/favicon.ico
ADDED
|
|
leaderboard-app/app/globals.css
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
@import "tailwindcss";
|
| 2 |
+
|
| 3 |
+
:root {
|
| 4 |
+
--background: #ffffff;
|
| 5 |
+
--foreground: #171717;
|
| 6 |
+
}
|
| 7 |
+
|
| 8 |
+
@theme inline {
|
| 9 |
+
--color-background: var(--background);
|
| 10 |
+
--color-foreground: var(--foreground);
|
| 11 |
+
--font-sans: var(--font-geist-sans);
|
| 12 |
+
--font-mono: var(--font-geist-mono);
|
| 13 |
+
}
|
| 14 |
+
|
| 15 |
+
/* Force light theme regardless of color scheme preference */
|
| 16 |
+
/* Disable dark mode
|
| 17 |
+
@media (prefers-color-scheme: dark) {
|
| 18 |
+
:root {
|
| 19 |
+
--background: #0a0a0a;
|
| 20 |
+
--foreground: #ededed;
|
| 21 |
+
}
|
| 22 |
+
}
|
| 23 |
+
*/
|
| 24 |
+
|
| 25 |
+
body {
|
| 26 |
+
background: var(--background);
|
| 27 |
+
color: var(--foreground);
|
| 28 |
+
font-family: Arial, Helvetica, sans-serif;
|
| 29 |
+
}
|
leaderboard-app/app/layout.js
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import { Inter } from 'next/font/google';
|
| 2 |
+
import './globals.css';
|
| 3 |
+
|
| 4 |
+
const inter = Inter({ subsets: ['latin'] });
|
| 5 |
+
|
| 6 |
+
export const metadata = {
|
| 7 |
+
title: 'LLM Comparison Leaderboard',
|
| 8 |
+
description: 'Interactive leaderboard comparing performance of state-of-the-art large language models across various tasks and metrics.',
|
| 9 |
+
};
|
| 10 |
+
|
| 11 |
+
export default function RootLayout({ children }) {
|
| 12 |
+
return (
|
| 13 |
+
<html lang="en">
|
| 14 |
+
<body className={`${inter.className} bg-gray-50`}>
|
| 15 |
+
{children}
|
| 16 |
+
</body>
|
| 17 |
+
</html>
|
| 18 |
+
);
|
| 19 |
+
}
|
leaderboard-app/app/page.js
ADDED
|
@@ -0,0 +1,84 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
'use client';
|
| 2 |
+
|
| 3 |
+
import { useState, useEffect } from 'react';
|
| 4 |
+
import dynamic from 'next/dynamic';
|
| 5 |
+
import { prepareDataForVisualization } from '../lib/utils';
|
| 6 |
+
|
| 7 |
+
// Dynamically import the dashboard component with SSR disabled
|
| 8 |
+
// This is important because recharts needs to be rendered on the client side
|
| 9 |
+
const LLMComparisonDashboard = dynamic(
|
| 10 |
+
() => import('../components/LLMComparisonDashboard'),
|
| 11 |
+
{ ssr: false }
|
| 12 |
+
);
|
| 13 |
+
|
| 14 |
+
export default function Home() {
|
| 15 |
+
const [data, setData] = useState(null);
|
| 16 |
+
const [loading, setLoading] = useState(true);
|
| 17 |
+
const [error, setError] = useState(null);
|
| 18 |
+
|
| 19 |
+
useEffect(() => {
|
| 20 |
+
async function fetchData() {
|
| 21 |
+
try {
|
| 22 |
+
setLoading(true);
|
| 23 |
+
|
| 24 |
+
// Fetch the data from the JSON file in the public directory
|
| 25 |
+
const response = await fetch('/llm_comparison_data.json');
|
| 26 |
+
|
| 27 |
+
if (!response.ok) {
|
| 28 |
+
throw new Error(`Failed to fetch data: ${response.status} ${response.statusText}`);
|
| 29 |
+
}
|
| 30 |
+
|
| 31 |
+
const jsonData = await response.json();
|
| 32 |
+
|
| 33 |
+
// Process the data for visualization
|
| 34 |
+
const processedData = prepareDataForVisualization(jsonData);
|
| 35 |
+
|
| 36 |
+
setData(processedData);
|
| 37 |
+
setLoading(false);
|
| 38 |
+
} catch (err) {
|
| 39 |
+
console.error('Error loading data:', err);
|
| 40 |
+
setError(err.message || 'Failed to load data');
|
| 41 |
+
setLoading(false);
|
| 42 |
+
}
|
| 43 |
+
}
|
| 44 |
+
|
| 45 |
+
fetchData();
|
| 46 |
+
}, []);
|
| 47 |
+
|
| 48 |
+
if (loading) {
|
| 49 |
+
return (
|
| 50 |
+
<div className="flex items-center justify-center min-h-screen">
|
| 51 |
+
<div className="text-center">
|
| 52 |
+
<div className="animate-spin rounded-full h-12 w-12 border-b-2 border-blue-500 mx-auto mb-4"></div>
|
| 53 |
+
<p className="text-lg text-gray-600">Loading LLM comparison data...</p>
|
| 54 |
+
</div>
|
| 55 |
+
</div>
|
| 56 |
+
);
|
| 57 |
+
}
|
| 58 |
+
|
| 59 |
+
if (error) {
|
| 60 |
+
return (
|
| 61 |
+
<div className="flex items-center justify-center min-h-screen">
|
| 62 |
+
<div className="text-center max-w-md p-6 bg-red-50 rounded-lg border border-red-200">
|
| 63 |
+
<svg xmlns="http://www.w3.org/2000/svg" className="h-12 w-12 text-red-500 mx-auto mb-4" fill="none" viewBox="0 0 24 24" stroke="currentColor">
|
| 64 |
+
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 8v4m0 4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
|
| 65 |
+
</svg>
|
| 66 |
+
<h2 className="text-xl font-bold text-red-700 mb-2">Error Loading Data</h2>
|
| 67 |
+
<p className="text-gray-600">{error}</p>
|
| 68 |
+
<button
|
| 69 |
+
onClick={() => window.location.reload()}
|
| 70 |
+
className="mt-4 px-4 py-2 bg-blue-500 text-white rounded hover:bg-blue-600 transition-colors"
|
| 71 |
+
>
|
| 72 |
+
Try Again
|
| 73 |
+
</button>
|
| 74 |
+
</div>
|
| 75 |
+
</div>
|
| 76 |
+
);
|
| 77 |
+
}
|
| 78 |
+
|
| 79 |
+
return (
|
| 80 |
+
<main className="min-h-screen p-4">
|
| 81 |
+
{data && <LLMComparisonDashboard data={data} />}
|
| 82 |
+
</main>
|
| 83 |
+
);
|
| 84 |
+
}
|
leaderboard-app/components/HeadToHeadComparison.jsx
ADDED
|
@@ -0,0 +1,1002 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"use client";
|
| 2 |
+
|
| 3 |
+
import React, { useState, useEffect, useMemo, useCallback } from "react";
|
| 4 |
+
import {
|
| 5 |
+
BarChart,
|
| 6 |
+
Bar,
|
| 7 |
+
XAxis,
|
| 8 |
+
YAxis,
|
| 9 |
+
CartesianGrid,
|
| 10 |
+
Tooltip,
|
| 11 |
+
Legend,
|
| 12 |
+
ResponsiveContainer,
|
| 13 |
+
RadarChart,
|
| 14 |
+
PolarGrid,
|
| 15 |
+
PolarAngleAxis,
|
| 16 |
+
PolarRadiusAxis,
|
| 17 |
+
Radar,
|
| 18 |
+
ComposedChart,
|
| 19 |
+
Cell,
|
| 20 |
+
ReferenceLine
|
| 21 |
+
} from "recharts";
|
| 22 |
+
|
| 23 |
+
// Format facet names for display
|
| 24 |
+
const formatFacetName = (facet) => {
|
| 25 |
+
const facetMap = {
|
| 26 |
+
"helpfulness": "Helpfulness",
|
| 27 |
+
"communication": "Communication",
|
| 28 |
+
"insightful": "Insightfulness",
|
| 29 |
+
"adaptiveness": "Adaptiveness",
|
| 30 |
+
"trustworthiness": "Trustworthiness",
|
| 31 |
+
"personality": "Personality",
|
| 32 |
+
"background_and_culture": "Cultural Awareness"
|
| 33 |
+
};
|
| 34 |
+
|
| 35 |
+
return facetMap[facet] || (facet ? facet.replace(/_/g, ' ').replace(/\b\w/g, l => l.toUpperCase()) : facet);
|
| 36 |
+
};
|
| 37 |
+
|
| 38 |
+
// Format aspect names for display
|
| 39 |
+
const formatAspectName = (aspect) => {
|
| 40 |
+
const aspectMap = {
|
| 41 |
+
"effectiveness": "Effectiveness",
|
| 42 |
+
"comprehensiveness": "Comprehensiveness",
|
| 43 |
+
"usefulness": "Usefulness",
|
| 44 |
+
"tone_and_language_style": "Tone & Language Style",
|
| 45 |
+
"naturalness": "Naturalness",
|
| 46 |
+
"detail_and_technical_language": "Detail & Technical Language",
|
| 47 |
+
"accuracy": "Accuracy",
|
| 48 |
+
"sharpness": "Sharpness",
|
| 49 |
+
"intuitive": "Intuitiveness",
|
| 50 |
+
"flexibility": "Flexibility",
|
| 51 |
+
"clarity": "Clarity",
|
| 52 |
+
"perceptiveness": "Perceptiveness",
|
| 53 |
+
"consistency": "Consistency",
|
| 54 |
+
"confidence": "Confidence",
|
| 55 |
+
"transparency": "Transparency",
|
| 56 |
+
"personality-consistency": "Personality Consistency",
|
| 57 |
+
"personality-definition": "Personality Definition",
|
| 58 |
+
"honesty-empathy-fairness": "Honesty, Empathy & Fairness",
|
| 59 |
+
"alignment": "Alignment",
|
| 60 |
+
"cultural_relevance": "Cultural Relevance",
|
| 61 |
+
"bias_freedom": "Freedom from Bias",
|
| 62 |
+
"background_and_culture": "Background and Culture"
|
| 63 |
+
};
|
| 64 |
+
|
| 65 |
+
return aspectMap[aspect] || (aspect ? aspect.replace(/_/g, ' ').replace(/-/g, ' ').replace(/\b\w/g, l => l.toUpperCase()) : aspect);
|
| 66 |
+
};
|
| 67 |
+
|
| 68 |
+
// Format and style value differences
|
| 69 |
+
const formatDifference = (value, isPercent = false) => {
|
| 70 |
+
const formatted = isPercent ? `${Math.abs(value).toFixed(1)}%` : Math.abs(value).toFixed(1);
|
| 71 |
+
const prefix = value > 0 ? '+' : value < 0 ? '-' : '';
|
| 72 |
+
return `${prefix}${formatted}`;
|
| 73 |
+
};
|
| 74 |
+
|
| 75 |
+
// Get color for difference values with consistent scale
|
| 76 |
+
const getDiffColor = (value, scale = "normal") => {
|
| 77 |
+
// For facet scores (-100 to +100)
|
| 78 |
+
if (scale === "facet") {
|
| 79 |
+
if (value > 10) return 'text-green-600';
|
| 80 |
+
if (value < -10) return 'text-red-600';
|
| 81 |
+
return 'text-gray-600';
|
| 82 |
+
}
|
| 83 |
+
|
| 84 |
+
// For aspect scores (0 to 100)
|
| 85 |
+
if (scale === "aspect") {
|
| 86 |
+
if (value > 5) return 'text-green-600';
|
| 87 |
+
if (value < -5) return 'text-red-600';
|
| 88 |
+
return 'text-gray-600';
|
| 89 |
+
}
|
| 90 |
+
|
| 91 |
+
// Default
|
| 92 |
+
if (value > 0.3) return 'text-green-600';
|
| 93 |
+
if (value < -0.3) return 'text-red-600';
|
| 94 |
+
return 'text-gray-600';
|
| 95 |
+
};
|
| 96 |
+
|
| 97 |
+
// Custom tooltip with proper formatting
|
| 98 |
+
const CustomTooltip = ({ active, payload, label }) => {
|
| 99 |
+
if (active && payload && payload.length) {
|
| 100 |
+
const formattedLabel = label.includes('_') ? formatFacetName(label.toLowerCase()) : label;
|
| 101 |
+
|
| 102 |
+
return (
|
| 103 |
+
<div className="bg-white p-3 border rounded shadow-sm">
|
| 104 |
+
<p className="font-medium">{formattedLabel}</p>
|
| 105 |
+
<div className="mt-2">
|
| 106 |
+
{payload
|
| 107 |
+
.filter(entry => !entry.dataKey.includes('_std') && !entry.dataKey.includes('difference'))
|
| 108 |
+
.map((entry, index) => {
|
| 109 |
+
const stdEntry = payload.find(p => p.dataKey === `${entry.dataKey}_std`);
|
| 110 |
+
const stdValue = stdEntry ? stdEntry.value : 0;
|
| 111 |
+
|
| 112 |
+
return (
|
| 113 |
+
<div key={index} className="flex items-center text-sm mb-1">
|
| 114 |
+
<div
|
| 115 |
+
className="w-3 h-3 rounded-full mr-1"
|
| 116 |
+
style={{ backgroundColor: entry.color }}
|
| 117 |
+
></div>
|
| 118 |
+
<span className="mr-2">{entry.name}:</span>
|
| 119 |
+
<span className="font-medium">{entry.value.toFixed(1)} {stdValue ? `± ${stdValue.toFixed(1)}` : ''}</span>
|
| 120 |
+
</div>
|
| 121 |
+
);
|
| 122 |
+
})}
|
| 123 |
+
|
| 124 |
+
{/* Add difference if available */}
|
| 125 |
+
{payload.find(p => p.dataKey === 'difference') && (
|
| 126 |
+
<div className="mt-2 pt-1 border-t">
|
| 127 |
+
<div className="flex items-center text-sm">
|
| 128 |
+
<span className="mr-2">Difference:</span>
|
| 129 |
+
<span className={`font-medium ${getDiffColor(payload.find(p => p.dataKey === 'difference').value, 'facet')}`}>
|
| 130 |
+
{formatDifference(payload.find(p => p.dataKey === 'difference').value)}
|
| 131 |
+
</span>
|
| 132 |
+
</div>
|
| 133 |
+
</div>
|
| 134 |
+
)}
|
| 135 |
+
</div>
|
| 136 |
+
</div>
|
| 137 |
+
);
|
| 138 |
+
}
|
| 139 |
+
return null;
|
| 140 |
+
};
|
| 141 |
+
|
| 142 |
+
// Custom tooltip for comparative bar chart
|
| 143 |
+
const ComparativeBarTooltip = ({ active, payload, label }) => {
|
| 144 |
+
if (active && payload && payload.length) {
|
| 145 |
+
const model1 = payload[0]?.name;
|
| 146 |
+
const model2 = payload[1]?.name;
|
| 147 |
+
const model1Value = payload[0]?.value;
|
| 148 |
+
const model2Value = payload[1]?.value;
|
| 149 |
+
const difference = model1Value !== undefined && model2Value !== undefined ? model1Value - model2Value : null;
|
| 150 |
+
|
| 151 |
+
return (
|
| 152 |
+
<div className="bg-white p-3 border rounded shadow-sm">
|
| 153 |
+
<p className="font-medium mb-1">{label}</p>
|
| 154 |
+
{payload.map((entry, index) => (
|
| 155 |
+
<div key={index} className="flex items-center text-sm mb-1">
|
| 156 |
+
<div
|
| 157 |
+
className="w-3 h-3 rounded-full mr-1"
|
| 158 |
+
style={{ backgroundColor: entry.color }}
|
| 159 |
+
></div>
|
| 160 |
+
<span className="mr-2">{entry.name}:</span>
|
| 161 |
+
<span className="font-medium">{entry.value.toFixed(1)}</span>
|
| 162 |
+
</div>
|
| 163 |
+
))}
|
| 164 |
+
{difference !== null && (
|
| 165 |
+
<div className={`text-sm mt-1 pt-1 border-t ${getDiffColor(difference, 'aspect')}`}>
|
| 166 |
+
Difference: {formatDifference(difference)}
|
| 167 |
+
</div>
|
| 168 |
+
)}
|
| 169 |
+
</div>
|
| 170 |
+
);
|
| 171 |
+
}
|
| 172 |
+
return null;
|
| 173 |
+
};
|
| 174 |
+
|
| 175 |
+
const HeadToHeadComparison = ({ data }) => {
|
| 176 |
+
const [compareModels, setCompareModels] = useState([]);
|
| 177 |
+
const [selectedView, setSelectedView] = useState("overview");
|
| 178 |
+
const [showCommonTasksOnly, setShowCommonTasksOnly] = useState(true);
|
| 179 |
+
const [selectedTaskType, setSelectedTaskType] = useState("all");
|
| 180 |
+
const [selectedDemographic, setSelectedDemographic] = useState("all");
|
| 181 |
+
|
| 182 |
+
const {
|
| 183 |
+
models,
|
| 184 |
+
taskData,
|
| 185 |
+
taskCategories,
|
| 186 |
+
radarData,
|
| 187 |
+
facets,
|
| 188 |
+
demographicSummary,
|
| 189 |
+
demographicOptions
|
| 190 |
+
} = data || {
|
| 191 |
+
models: [],
|
| 192 |
+
taskData: [],
|
| 193 |
+
taskCategories: {},
|
| 194 |
+
radarData: [],
|
| 195 |
+
facets: {},
|
| 196 |
+
demographicSummary: {},
|
| 197 |
+
demographicOptions: {}
|
| 198 |
+
};
|
| 199 |
+
|
| 200 |
+
// Initialize compare models if empty
|
| 201 |
+
useEffect(() => {
|
| 202 |
+
if (compareModels.length === 0 && models.length > 1) {
|
| 203 |
+
setCompareModels([models[0].model, models[1].model]);
|
| 204 |
+
}
|
| 205 |
+
}, [models, compareModels]);
|
| 206 |
+
|
| 207 |
+
// Get model data by name (memoized)
|
| 208 |
+
const getModelByName = useCallback((name) => {
|
| 209 |
+
return models.find(m => m.model === name);
|
| 210 |
+
}, [models]);
|
| 211 |
+
|
| 212 |
+
// Generate data for the radar chart comparison (memoized)
|
| 213 |
+
const comparisonRadarData = useMemo(() => {
|
| 214 |
+
if (compareModels.length !== 2 || !radarData) return [];
|
| 215 |
+
|
| 216 |
+
return radarData.map(item => {
|
| 217 |
+
const category = item.category;
|
| 218 |
+
const model1Score = item[compareModels[0]] || 0;
|
| 219 |
+
const model2Score = item[compareModels[1]] || 0;
|
| 220 |
+
|
| 221 |
+
return {
|
| 222 |
+
category,
|
| 223 |
+
[compareModels[0]]: model1Score,
|
| 224 |
+
[compareModels[1]]: model2Score,
|
| 225 |
+
difference: model1Score - model2Score
|
| 226 |
+
};
|
| 227 |
+
});
|
| 228 |
+
}, [compareModels, radarData]);
|
| 229 |
+
|
| 230 |
+
// Get task comparison data (memoized)
|
| 231 |
+
const taskComparisonData = useMemo(() => {
|
| 232 |
+
if (compareModels.length !== 2 || !taskData) return [];
|
| 233 |
+
|
| 234 |
+
// Filter tasks based on selectedTaskType
|
| 235 |
+
let filteredTasks = [...taskData];
|
| 236 |
+
if (selectedTaskType !== "all") {
|
| 237 |
+
filteredTasks = taskData.filter(task =>
|
| 238 |
+
taskCategories[selectedTaskType]?.includes(task.task)
|
| 239 |
+
);
|
| 240 |
+
}
|
| 241 |
+
|
| 242 |
+
// Filter for common tasks if requested
|
| 243 |
+
if (showCommonTasksOnly) {
|
| 244 |
+
filteredTasks = filteredTasks.filter(task =>
|
| 245 |
+
task[compareModels[0]] !== undefined &&
|
| 246 |
+
task[compareModels[1]] !== undefined
|
| 247 |
+
);
|
| 248 |
+
}
|
| 249 |
+
|
| 250 |
+
return filteredTasks.map(task => {
|
| 251 |
+
const model1Score = task[compareModels[0]] || 0;
|
| 252 |
+
const model2Score = task[compareModels[1]] || 0;
|
| 253 |
+
|
| 254 |
+
return {
|
| 255 |
+
task: task.task,
|
| 256 |
+
category: task.category,
|
| 257 |
+
[compareModels[0]]: model1Score,
|
| 258 |
+
[compareModels[1]]: model2Score,
|
| 259 |
+
difference: model1Score - model2Score
|
| 260 |
+
};
|
| 261 |
+
}).sort((a, b) => Math.abs(b.difference) - Math.abs(a.difference));
|
| 262 |
+
}, [compareModels, taskData, selectedTaskType, showCommonTasksOnly, taskCategories]);
|
| 263 |
+
|
| 264 |
+
// Get facet comparison data (memoized)
|
| 265 |
+
const facetComparisonData = useMemo(() => {
|
| 266 |
+
if (compareModels.length !== 2 || !radarData) return [];
|
| 267 |
+
|
| 268 |
+
return radarData
|
| 269 |
+
.filter(item => item.category !== "Repeat Usage") // Skip repeat usage
|
| 270 |
+
.map(item => {
|
| 271 |
+
const model1Score = item[compareModels[0]] || 0;
|
| 272 |
+
const model2Score = item[compareModels[1]] || 0;
|
| 273 |
+
|
| 274 |
+
return {
|
| 275 |
+
facet: item.category,
|
| 276 |
+
[compareModels[0]]: model1Score,
|
| 277 |
+
[compareModels[1]]: model2Score,
|
| 278 |
+
difference: model1Score - model2Score
|
| 279 |
+
};
|
| 280 |
+
})
|
| 281 |
+
.sort((a, b) => Math.abs(b.difference) - Math.abs(a.difference));
|
| 282 |
+
}, [compareModels, radarData]);
|
| 283 |
+
|
| 284 |
+
// Get aspect comparison data for all facets (memoized)
|
| 285 |
+
const aspectComparisonData = useMemo(() => {
|
| 286 |
+
if (compareModels.length !== 2) return [];
|
| 287 |
+
|
| 288 |
+
const model1 = getModelByName(compareModels[0]);
|
| 289 |
+
const model2 = getModelByName(compareModels[1]);
|
| 290 |
+
|
| 291 |
+
if (!model1 || !model2 || !facets) return [];
|
| 292 |
+
|
| 293 |
+
const aspectData = [];
|
| 294 |
+
|
| 295 |
+
// For each facet, get aspect comparison
|
| 296 |
+
Object.entries(facets).forEach(([facet, aspects]) => {
|
| 297 |
+
if (facet === "repeat_usage") return; // Skip repeat usage
|
| 298 |
+
|
| 299 |
+
// For each aspect in this facet
|
| 300 |
+
aspects.forEach(aspect => {
|
| 301 |
+
const model1Score = model1.breakdown_scores?.[aspect] || 0;
|
| 302 |
+
const model2Score = model2.breakdown_scores?.[aspect] || 0;
|
| 303 |
+
|
| 304 |
+
aspectData.push({
|
| 305 |
+
facet: formatFacetName(facet),
|
| 306 |
+
aspect: formatAspectName(aspect),
|
| 307 |
+
[model1.model]: model1Score,
|
| 308 |
+
[model2.model]: model2Score,
|
| 309 |
+
difference: model1Score - model2Score
|
| 310 |
+
});
|
| 311 |
+
});
|
| 312 |
+
});
|
| 313 |
+
|
| 314 |
+
return aspectData.sort((a, b) => Math.abs(b.difference) - Math.abs(a.difference));
|
| 315 |
+
}, [compareModels, facets, getModelByName]);
|
| 316 |
+
|
| 317 |
+
// Calculate key findings & summary stats (memoized)
|
| 318 |
+
const summaryStats = useMemo(() => {
|
| 319 |
+
if (compareModels.length !== 2) return null;
|
| 320 |
+
|
| 321 |
+
const model1 = getModelByName(compareModels[0]);
|
| 322 |
+
const model2 = getModelByName(compareModels[1]);
|
| 323 |
+
|
| 324 |
+
if (!model1 || !model2) return null;
|
| 325 |
+
|
| 326 |
+
// Count tasks where each model wins
|
| 327 |
+
const model1Wins = taskComparisonData.filter(t => t[compareModels[0]] > t[compareModels[1]]).length;
|
| 328 |
+
const model2Wins = taskComparisonData.filter(t => t[compareModels[1]] > t[compareModels[0]]).length;
|
| 329 |
+
const ties = taskComparisonData.filter(t => t[compareModels[0]] === t[compareModels[1]]).length;
|
| 330 |
+
|
| 331 |
+
// Calculate average difference across all tasks
|
| 332 |
+
const avgDifference = taskComparisonData.length > 0
|
| 333 |
+
? taskComparisonData.reduce((sum, task) => sum + (task[compareModels[0]] - task[compareModels[1]]), 0) / taskComparisonData.length
|
| 334 |
+
: 0;
|
| 335 |
+
|
| 336 |
+
// Find biggest win for each model
|
| 337 |
+
const model1BiggestWin = [...taskComparisonData].sort((a, b) => b.difference - a.difference)[0];
|
| 338 |
+
const model2BiggestWin = [...taskComparisonData].sort((a, b) => a.difference - b.difference)[0];
|
| 339 |
+
|
| 340 |
+
// Facet where each model most outperforms the other
|
| 341 |
+
const model1BestFacet = [...facetComparisonData].sort((a, b) => b.difference - a.difference)[0];
|
| 342 |
+
const model2BestFacet = [...facetComparisonData].sort((a, b) => a.difference - b.difference)[0];
|
| 343 |
+
|
| 344 |
+
// Aspect where each model most outperforms the other
|
| 345 |
+
const model1BestAspect = [...aspectComparisonData].sort((a, b) => b.difference - a.difference)[0];
|
| 346 |
+
const model2BestAspect = [...aspectComparisonData].sort((a, b) => a.difference - b.difference)[0];
|
| 347 |
+
|
| 348 |
+
return {
|
| 349 |
+
model1,
|
| 350 |
+
model2,
|
| 351 |
+
model1Wins,
|
| 352 |
+
model2Wins,
|
| 353 |
+
ties,
|
| 354 |
+
avgDifference,
|
| 355 |
+
model1BiggestWin,
|
| 356 |
+
model2BiggestWin,
|
| 357 |
+
model1BestFacet,
|
| 358 |
+
model2BestFacet,
|
| 359 |
+
model1BestAspect,
|
| 360 |
+
model2BestAspect
|
| 361 |
+
};
|
| 362 |
+
}, [compareModels, getModelByName, taskComparisonData, facetComparisonData, aspectComparisonData]);
|
| 363 |
+
|
| 364 |
+
// Create comparative stats for high level metrics
|
| 365 |
+
const highLevelComparison = useMemo(() => {
|
| 366 |
+
if (compareModels.length !== 2) return [];
|
| 367 |
+
|
| 368 |
+
const model1 = getModelByName(compareModels[0]);
|
| 369 |
+
const model2 = getModelByName(compareModels[1]);
|
| 370 |
+
|
| 371 |
+
if (!model1 || !model2) return [];
|
| 372 |
+
|
| 373 |
+
// Define the metrics to compare
|
| 374 |
+
const metrics = [
|
| 375 |
+
{ name: 'Overall Score', key: 'overall_score', model1: model1.overall_score, model2: model2.overall_score, scale: "aspect" },
|
| 376 |
+
{ name: 'Would Use Again', key: 'repeat_usage_pct', model1: model1.repeat_usage_pct, model2: model2.repeat_usage_pct, isPercent: true }
|
| 377 |
+
];
|
| 378 |
+
|
| 379 |
+
// Add facet comparisons
|
| 380 |
+
if (model1.facet_scores && model2.facet_scores) {
|
| 381 |
+
Object.keys(model1.facet_scores)
|
| 382 |
+
.filter(key => !key.includes('_std') && key !== 'repeat_usage') // Skip std and repeat_usage
|
| 383 |
+
.forEach(facet => {
|
| 384 |
+
metrics.push({
|
| 385 |
+
name: formatFacetName(facet),
|
| 386 |
+
key: `facet_${facet}`,
|
| 387 |
+
model1: model1.facet_scores[facet],
|
| 388 |
+
model2: model2.facet_scores[facet],
|
| 389 |
+
scale: "facet"
|
| 390 |
+
});
|
| 391 |
+
});
|
| 392 |
+
}
|
| 393 |
+
|
| 394 |
+
return metrics.map(metric => ({
|
| 395 |
+
name: metric.name,
|
| 396 |
+
key: metric.key,
|
| 397 |
+
[model1.model]: metric.model1,
|
| 398 |
+
[model2.model]: metric.model2,
|
| 399 |
+
difference: metric.model1 - metric.model2,
|
| 400 |
+
percentDifference: ((metric.model1 - metric.model2) / Math.abs(metric.model2)) * 100,
|
| 401 |
+
isPercent: metric.isPercent,
|
| 402 |
+
scale: metric.scale
|
| 403 |
+
}));
|
| 404 |
+
}, [compareModels, getModelByName]);
|
| 405 |
+
|
| 406 |
+
return (
|
| 407 |
+
<div>
|
| 408 |
+
<h2 className="text-2xl font-bold mb-2">Head-to-Head Model Comparison</h2>
|
| 409 |
+
<p className="text-gray-600 mb-4">
|
| 410 |
+
Directly compare two models across all performance metrics to identify strengths and
|
| 411 |
+
weaknesses of each model relative to one another.
|
| 412 |
+
</p>
|
| 413 |
+
|
| 414 |
+
{/* Sticky Model Selection Panel */}
|
| 415 |
+
<div className="sticky top-0 z-10 bg-white border rounded-lg p-4 mb-6 shadow-sm">
|
| 416 |
+
<div className="flex flex-wrap items-center justify-between">
|
| 417 |
+
<div className="flex items-center space-x-4">
|
| 418 |
+
<div>
|
| 419 |
+
<label className="block text-sm font-medium text-gray-700 mb-1">First Model</label>
|
| 420 |
+
<select
|
| 421 |
+
className="border rounded p-1.5 bg-white shadow-sm focus:outline-none focus:ring-1 focus:ring-blue-500"
|
| 422 |
+
value={compareModels[0] || ''}
|
| 423 |
+
onChange={(e) => setCompareModels([e.target.value, compareModels[1] || ''])}
|
| 424 |
+
>
|
| 425 |
+
{models.map(model => (
|
| 426 |
+
<option
|
| 427 |
+
key={`model1-${model.model}`}
|
| 428 |
+
value={model.model}
|
| 429 |
+
disabled={model.model === compareModels[1]}
|
| 430 |
+
>
|
| 431 |
+
{model.model}
|
| 432 |
+
</option>
|
| 433 |
+
))}
|
| 434 |
+
</select>
|
| 435 |
+
</div>
|
| 436 |
+
|
| 437 |
+
<div className="text-lg font-bold text-gray-500">vs</div>
|
| 438 |
+
|
| 439 |
+
<div>
|
| 440 |
+
<label className="block text-sm font-medium text-gray-700 mb-1">Second Model</label>
|
| 441 |
+
<select
|
| 442 |
+
className="border rounded p-1.5 bg-white shadow-sm focus:outline-none focus:ring-1 focus:ring-blue-500"
|
| 443 |
+
value={compareModels[1] || ''}
|
| 444 |
+
onChange={(e) => setCompareModels([compareModels[0] || '', e.target.value])}
|
| 445 |
+
>
|
| 446 |
+
{models.map(model => (
|
| 447 |
+
<option
|
| 448 |
+
key={`model2-${model.model}`}
|
| 449 |
+
value={model.model}
|
| 450 |
+
disabled={model.model === compareModels[0]}
|
| 451 |
+
>
|
| 452 |
+
{model.model}
|
| 453 |
+
</option>
|
| 454 |
+
))}
|
| 455 |
+
</select>
|
| 456 |
+
</div>
|
| 457 |
+
</div>
|
| 458 |
+
|
| 459 |
+
<div className="mt-2 sm:mt-0">
|
| 460 |
+
<label className="text-sm text-gray-500 mr-2">Show only tasks with data for both models:</label>
|
| 461 |
+
<button
|
| 462 |
+
className={`px-3 py-1 text-xs font-medium rounded ${
|
| 463 |
+
showCommonTasksOnly
|
| 464 |
+
? "bg-blue-100 text-blue-800 border border-blue-300"
|
| 465 |
+
: "bg-gray-100 text-gray-800 border border-gray-300"
|
| 466 |
+
}`}
|
| 467 |
+
onClick={() => setShowCommonTasksOnly(!showCommonTasksOnly)}
|
| 468 |
+
>
|
| 469 |
+
{showCommonTasksOnly ? 'Common Tasks Only' : 'All Tasks'}
|
| 470 |
+
</button>
|
| 471 |
+
</div>
|
| 472 |
+
</div>
|
| 473 |
+
</div>
|
| 474 |
+
|
| 475 |
+
{/* Tab Navigation */}
|
| 476 |
+
<div className="mb-4 border-b">
|
| 477 |
+
<div className="flex flex-wrap">
|
| 478 |
+
{["overview", "tasks", "facets", "aspects", "demographics"].map((tab) => (
|
| 479 |
+
<button
|
| 480 |
+
key={tab}
|
| 481 |
+
className={`px-6 py-3 font-medium text-sm ${
|
| 482 |
+
selectedView === tab
|
| 483 |
+
? "bg-white text-blue-700 border-b-2 border-blue-500"
|
| 484 |
+
: "text-gray-600 hover:text-gray-800 hover:bg-gray-50"
|
| 485 |
+
}`}
|
| 486 |
+
onClick={() => setSelectedView(tab)}
|
| 487 |
+
>
|
| 488 |
+
{tab.charAt(0).toUpperCase() + tab.slice(1)}
|
| 489 |
+
</button>
|
| 490 |
+
))}
|
| 491 |
+
</div>
|
| 492 |
+
</div>
|
| 493 |
+
|
| 494 |
+
{/* Key Findings Section (Always Visible) */}
|
| 495 |
+
{summaryStats && (
|
| 496 |
+
<div className="border rounded-lg overflow-hidden mb-6 bg-blue-50">
|
| 497 |
+
<div className="px-4 py-2 bg-blue-100 border-b">
|
| 498 |
+
<h3 className="font-semibold">Key Insights</h3>
|
| 499 |
+
</div>
|
| 500 |
+
<div className="p-4">
|
| 501 |
+
<div className="grid grid-cols-1 md:grid-cols-3 gap-4">
|
| 502 |
+
{/* Overall Comparison */}
|
| 503 |
+
<div className="bg-white rounded-lg shadow-sm p-3">
|
| 504 |
+
<h4 className="text-sm font-medium text-gray-700 mb-2">Overall Comparison</h4>
|
| 505 |
+
<div className="flex items-center mb-2">
|
| 506 |
+
<div className="w-3 h-3 rounded-full mr-1" style={{ backgroundColor: summaryStats.model1.color }}></div>
|
| 507 |
+
<span className="font-medium mr-2">{summaryStats.model1.model}:</span>
|
| 508 |
+
<span>{summaryStats.model1.overall_score.toFixed(1)}</span>
|
| 509 |
+
</div>
|
| 510 |
+
<div className="flex items-center mb-2">
|
| 511 |
+
<div className="w-3 h-3 rounded-full mr-1" style={{ backgroundColor: summaryStats.model2.color }}></div>
|
| 512 |
+
<span className="font-medium mr-2">{summaryStats.model2.model}:</span>
|
| 513 |
+
<span>{summaryStats.model2.overall_score.toFixed(1)}</span>
|
| 514 |
+
</div>
|
| 515 |
+
<div className="mt-2 text-sm">
|
| 516 |
+
<span className="font-medium">Average Difference: </span>
|
| 517 |
+
<span className={
|
| 518 |
+
Math.abs(summaryStats.avgDifference) < 1 ? "text-gray-600" :
|
| 519 |
+
summaryStats.avgDifference > 0 ? "text-green-600 font-medium" : "text-red-600 font-medium"
|
| 520 |
+
}>
|
| 521 |
+
{summaryStats.avgDifference > 0 ? '+' : ''}{summaryStats.avgDifference.toFixed(1)}
|
| 522 |
+
</span>
|
| 523 |
+
</div>
|
| 524 |
+
</div>
|
| 525 |
+
|
| 526 |
+
{/* Task Wins */}
|
| 527 |
+
<div className="bg-white rounded-lg shadow-sm p-3">
|
| 528 |
+
<h4 className="text-sm font-medium text-gray-700 mb-2">Task Win Distribution</h4>
|
| 529 |
+
<div className="flex items-center justify-between mb-1">
|
| 530 |
+
<div className="flex items-center">
|
| 531 |
+
<div className="w-3 h-3 rounded-full mr-1" style={{ backgroundColor: summaryStats.model1.color }}></div>
|
| 532 |
+
<span>{summaryStats.model1.model}</span>
|
| 533 |
+
</div>
|
| 534 |
+
<span className="font-medium">{summaryStats.model1Wins} tasks</span>
|
| 535 |
+
</div>
|
| 536 |
+
<div className="flex items-center justify-between mb-1">
|
| 537 |
+
<div className="flex items-center">
|
| 538 |
+
<div className="w-3 h-3 rounded-full mr-1" style={{ backgroundColor: summaryStats.model2.color }}></div>
|
| 539 |
+
<span>{summaryStats.model2.model}</span>
|
| 540 |
+
</div>
|
| 541 |
+
<span className="font-medium">{summaryStats.model2Wins} tasks</span>
|
| 542 |
+
</div>
|
| 543 |
+
{summaryStats.ties > 0 && (
|
| 544 |
+
<div className="flex items-center justify-between">
|
| 545 |
+
<span className="text-gray-600">Ties</span>
|
| 546 |
+
<span className="font-medium">{summaryStats.ties} tasks</span>
|
| 547 |
+
</div>
|
| 548 |
+
)}
|
| 549 |
+
</div>
|
| 550 |
+
|
| 551 |
+
{/* Key Advantages */}
|
| 552 |
+
<div className="bg-white rounded-lg shadow-sm p-3">
|
| 553 |
+
<h4 className="text-sm font-medium text-gray-700 mb-2">Biggest Advantages</h4>
|
| 554 |
+
{summaryStats.model1BiggestWin && (
|
| 555 |
+
<div className="mb-2">
|
| 556 |
+
<div className="flex items-center">
|
| 557 |
+
<div className="w-3 h-3 rounded-full mr-1" style={{ backgroundColor: summaryStats.model1.color }}></div>
|
| 558 |
+
<span className="font-medium text-sm">{summaryStats.model1.model}:</span>
|
| 559 |
+
</div>
|
| 560 |
+
<div className="text-sm ml-4 mt-0.5">
|
| 561 |
+
{summaryStats.model1BiggestWin.task.length > 30
|
| 562 |
+
? summaryStats.model1BiggestWin.task.slice(0, 30) + '...'
|
| 563 |
+
: summaryStats.model1BiggestWin.task}
|
| 564 |
+
<span className="text-green-600 font-medium ml-1">
|
| 565 |
+
(+{summaryStats.model1BiggestWin.difference.toFixed(1)})
|
| 566 |
+
</span>
|
| 567 |
+
</div>
|
| 568 |
+
</div>
|
| 569 |
+
)}
|
| 570 |
+
{summaryStats.model2BiggestWin && (
|
| 571 |
+
<div>
|
| 572 |
+
<div className="flex items-center">
|
| 573 |
+
<div className="w-3 h-3 rounded-full mr-1" style={{ backgroundColor: summaryStats.model2.color }}></div>
|
| 574 |
+
<span className="font-medium text-sm">{summaryStats.model2.model}:</span>
|
| 575 |
+
</div>
|
| 576 |
+
<div className="text-sm ml-4 mt-0.5">
|
| 577 |
+
{summaryStats.model2BiggestWin.task.length > 30
|
| 578 |
+
? summaryStats.model2BiggestWin.task.slice(0, 30) + '...'
|
| 579 |
+
: summaryStats.model2BiggestWin.task}
|
| 580 |
+
<span className="text-green-600 font-medium ml-1">
|
| 581 |
+
(+{Math.abs(summaryStats.model2BiggestWin.difference).toFixed(1)})
|
| 582 |
+
</span>
|
| 583 |
+
</div>
|
| 584 |
+
</div>
|
| 585 |
+
)}
|
| 586 |
+
</div>
|
| 587 |
+
</div>
|
| 588 |
+
</div>
|
| 589 |
+
</div>
|
| 590 |
+
)}
|
| 591 |
+
|
| 592 |
+
{/* OVERVIEW TAB */}
|
| 593 |
+
{selectedView === "overview" && summaryStats && (
|
| 594 |
+
<div>
|
| 595 |
+
{/* Side-by-side charts */}
|
| 596 |
+
<div className="grid grid-cols-1 lg:grid-cols-2 gap-6 mb-6">
|
| 597 |
+
{/* Radar Chart */}
|
| 598 |
+
<div className="border rounded-lg overflow-hidden">
|
| 599 |
+
<div className="px-4 py-2 bg-gray-50 border-b">
|
| 600 |
+
<h3 className="font-semibold">Facet Comparison</h3>
|
| 601 |
+
</div>
|
| 602 |
+
<div className="p-4">
|
| 603 |
+
<div className="h-80">
|
| 604 |
+
<ResponsiveContainer width="100%" height="100%">
|
| 605 |
+
<RadarChart
|
| 606 |
+
outerRadius={130}
|
| 607 |
+
data={comparisonRadarData}
|
| 608 |
+
margin={{ top: 30, right: 30, bottom: 30, left: 30 }}
|
| 609 |
+
>
|
| 610 |
+
<PolarGrid gridType="polygon" />
|
| 611 |
+
<PolarAngleAxis
|
| 612 |
+
dataKey="category"
|
| 613 |
+
tick={{ fill: '#4b5563', fontSize: 14 }}
|
| 614 |
+
tickLine={false}
|
| 615 |
+
tickFormatter={(value) => {
|
| 616 |
+
if (value.includes('_') || value === "Insightful") {
|
| 617 |
+
return formatFacetName(value.toLowerCase());
|
| 618 |
+
}
|
| 619 |
+
return value;
|
| 620 |
+
}}
|
| 621 |
+
/>
|
| 622 |
+
<PolarRadiusAxis
|
| 623 |
+
angle={90}
|
| 624 |
+
domain={[-100, 100]}
|
| 625 |
+
axisLine={false}
|
| 626 |
+
tickCount={5}
|
| 627 |
+
/>
|
| 628 |
+
{compareModels.map(modelName => {
|
| 629 |
+
const model = getModelByName(modelName);
|
| 630 |
+
return (
|
| 631 |
+
<Radar
|
| 632 |
+
key={modelName}
|
| 633 |
+
name={modelName}
|
| 634 |
+
dataKey={modelName}
|
| 635 |
+
stroke={model?.color || '#999'}
|
| 636 |
+
fill={model?.color || '#999'}
|
| 637 |
+
fillOpacity={0.2}
|
| 638 |
+
strokeWidth={2}
|
| 639 |
+
/>
|
| 640 |
+
);
|
| 641 |
+
})}
|
| 642 |
+
<Tooltip content={<CustomTooltip />} />
|
| 643 |
+
<Legend />
|
| 644 |
+
</RadarChart>
|
| 645 |
+
</ResponsiveContainer>
|
| 646 |
+
</div>
|
| 647 |
+
</div>
|
| 648 |
+
</div>
|
| 649 |
+
|
| 650 |
+
{/* Gap Analysis */}
|
| 651 |
+
<div className="border rounded-lg overflow-hidden">
|
| 652 |
+
<div className="px-4 py-2 bg-gray-50 border-b">
|
| 653 |
+
<h3 className="font-semibold">Facet Gap Analysis</h3>
|
| 654 |
+
</div>
|
| 655 |
+
<div className="p-4">
|
| 656 |
+
<div className="h-80">
|
| 657 |
+
<ResponsiveContainer width="100%" height="100%">
|
| 658 |
+
<ComposedChart
|
| 659 |
+
layout="vertical"
|
| 660 |
+
data={facetComparisonData}
|
| 661 |
+
margin={{ top: 20, right: 60, left: 100, bottom: 20 }}
|
| 662 |
+
>
|
| 663 |
+
<CartesianGrid strokeDasharray="3 3" />
|
| 664 |
+
<XAxis
|
| 665 |
+
type="number"
|
| 666 |
+
domain={[-50, 50]}
|
| 667 |
+
tickFormatter={(value) => value > 0 ? `+${value.toFixed(0)}` : value.toFixed(0)}
|
| 668 |
+
/>
|
| 669 |
+
<YAxis
|
| 670 |
+
dataKey="facet"
|
| 671 |
+
type="category"
|
| 672 |
+
width={100}
|
| 673 |
+
/>
|
| 674 |
+
<Tooltip
|
| 675 |
+
formatter={(value) => [value.toFixed(1), 'Difference']}
|
| 676 |
+
/>
|
| 677 |
+
<Legend />
|
| 678 |
+
<Bar
|
| 679 |
+
dataKey="difference"
|
| 680 |
+
name={`${compareModels[0]} vs ${compareModels[1]}`}
|
| 681 |
+
barSize={20}
|
| 682 |
+
>
|
| 683 |
+
{facetComparisonData.map((entry, index) => (
|
| 684 |
+
<Cell
|
| 685 |
+
key={`cell-${index}`}
|
| 686 |
+
fill={entry.difference > 0 ? getModelByName(compareModels[0])?.color : getModelByName(compareModels[1])?.color}
|
| 687 |
+
/>
|
| 688 |
+
))}
|
| 689 |
+
</Bar>
|
| 690 |
+
<ReferenceLine x={0} stroke="#666" strokeWidth={2} />
|
| 691 |
+
</ComposedChart>
|
| 692 |
+
</ResponsiveContainer>
|
| 693 |
+
</div>
|
| 694 |
+
<div className="text-xs text-gray-500 text-center mt-2">
|
| 695 |
+
Bars extending right indicate {compareModels[0]} is better, left means {compareModels[1]} is better.
|
| 696 |
+
</div>
|
| 697 |
+
</div>
|
| 698 |
+
</div>
|
| 699 |
+
</div>
|
| 700 |
+
|
| 701 |
+
{/* Key Metrics Table */}
|
| 702 |
+
<div className="border rounded-lg overflow-hidden mb-6">
|
| 703 |
+
<div className="px-4 py-2 bg-gray-50 border-b">
|
| 704 |
+
<h3 className="font-semibold">Key Metrics Comparison</h3>
|
| 705 |
+
</div>
|
| 706 |
+
<div className="p-4">
|
| 707 |
+
<div className="overflow-x-auto">
|
| 708 |
+
<table className="min-w-full divide-y divide-gray-200">
|
| 709 |
+
<thead className="bg-gray-50">
|
| 710 |
+
<tr>
|
| 711 |
+
<th className="px-4 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">Metric</th>
|
| 712 |
+
{compareModels.map(modelName => {
|
| 713 |
+
const model = getModelByName(modelName);
|
| 714 |
+
return (
|
| 715 |
+
<th key={modelName} className="px-4 py-2 text-left text-xs font-medium uppercase tracking-wider" style={{ color: model?.color }}>
|
| 716 |
+
{modelName}
|
| 717 |
+
</th>
|
| 718 |
+
);
|
| 719 |
+
})}
|
| 720 |
+
<th className="px-4 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">Difference</th>
|
| 721 |
+
</tr>
|
| 722 |
+
</thead>
|
| 723 |
+
<tbody className="bg-white divide-y divide-gray-200">
|
| 724 |
+
{highLevelComparison.map((metric) => (
|
| 725 |
+
<tr key={metric.key} className="hover:bg-gray-50">
|
| 726 |
+
<td className="px-4 py-3 whitespace-nowrap text-sm font-medium text-gray-900">
|
| 727 |
+
{metric.name}
|
| 728 |
+
</td>
|
| 729 |
+
{compareModels.map(modelName => {
|
| 730 |
+
const value = metric[modelName];
|
| 731 |
+
const isPercent = metric.isPercent;
|
| 732 |
+
|
| 733 |
+
return (
|
| 734 |
+
<td key={`${metric.key}-${modelName}`} className="px-4 py-3 whitespace-nowrap text-sm text-gray-700">
|
| 735 |
+
<span className={`font-medium ${metric.difference !== 0 && modelName === compareModels[0] && metric.difference > 0 ? 'text-green-600' : ''} ${metric.difference !== 0 && modelName === compareModels[1] && metric.difference < 0 ? 'text-green-600' : ''}`}>
|
| 736 |
+
{isPercent ? `${value.toFixed(1)}%` : value.toFixed(1)}
|
| 737 |
+
</span>
|
| 738 |
+
</td>
|
| 739 |
+
);
|
| 740 |
+
})}
|
| 741 |
+
<td className="px-4 py-3 whitespace-nowrap text-sm">
|
| 742 |
+
<span className={`font-medium ${getDiffColor(metric.difference, metric.scale)}`}>
|
| 743 |
+
{formatDifference(metric.difference, metric.isPercent)}
|
| 744 |
+
</span>
|
| 745 |
+
</td>
|
| 746 |
+
</tr>
|
| 747 |
+
))}
|
| 748 |
+
</tbody>
|
| 749 |
+
</table>
|
| 750 |
+
</div>
|
| 751 |
+
<div className="text-xs text-gray-500 mt-3">
|
| 752 |
+
Differences are calculated as {compareModels[0]} minus {compareModels[1]}. Positive values indicate {compareModels[0]} is higher.
|
| 753 |
+
</div>
|
| 754 |
+
</div>
|
| 755 |
+
</div>
|
| 756 |
+
|
| 757 |
+
{/* Interactive Recommendation */}
|
| 758 |
+
<div className="border rounded-lg overflow-hidden mb-6 bg-blue-50">
|
| 759 |
+
<div className="px-4 py-2 bg-blue-100 border-b">
|
| 760 |
+
<h3 className="font-semibold">When to Use Each Model</h3>
|
| 761 |
+
</div>
|
| 762 |
+
<div className="p-4 text-sm text-gray-800">
|
| 763 |
+
<div className="grid grid-cols-1 sm:grid-cols-2 gap-6">
|
| 764 |
+
<div className="bg-white rounded-lg p-4 shadow-sm">
|
| 765 |
+
<h4 className="font-medium mb-2" style={{ color: summaryStats.model1.color }}>
|
| 766 |
+
When to use {summaryStats.model1.model}:
|
| 767 |
+
</h4>
|
| 768 |
+
<ul className="list-disc pl-5 space-y-1 text-sm">
|
| 769 |
+
<li>For {summaryStats.model1BestFacet?.facet.toLowerCase() || 'overall'} focused tasks</li>
|
| 770 |
+
{summaryStats.model1BiggestWin && (
|
| 771 |
+
<li>When working on tasks like "{summaryStats.model1BiggestWin.task}"</li>
|
| 772 |
+
)}
|
| 773 |
+
{summaryStats.model1BestAspect && (
|
| 774 |
+
<li>When {summaryStats.model1BestAspect.aspect.toLowerCase()} is important</li>
|
| 775 |
+
)}
|
| 776 |
+
</ul>
|
| 777 |
+
</div>
|
| 778 |
+
<div className="bg-white rounded-lg p-4 shadow-sm">
|
| 779 |
+
<h4 className="font-medium mb-2" style={{ color: summaryStats.model2.color }}>
|
| 780 |
+
When to use {summaryStats.model2.model}:
|
| 781 |
+
</h4>
|
| 782 |
+
<ul className="list-disc pl-5 space-y-1 text-sm">
|
| 783 |
+
<li>For {summaryStats.model2BestFacet?.facet.toLowerCase() || 'overall'} focused tasks</li>
|
| 784 |
+
{summaryStats.model2BiggestWin && (
|
| 785 |
+
<li>When working on tasks like "{summaryStats.model2BiggestWin.task}"</li>
|
| 786 |
+
)}
|
| 787 |
+
{summaryStats.model2BestAspect && (
|
| 788 |
+
<li>When {summaryStats.model2BestAspect.aspect.toLowerCase()} is important</li>
|
| 789 |
+
)}
|
| 790 |
+
</ul>
|
| 791 |
+
</div>
|
| 792 |
+
</div>
|
| 793 |
+
</div>
|
| 794 |
+
</div>
|
| 795 |
+
</div>
|
| 796 |
+
)}
|
| 797 |
+
|
| 798 |
+
{/* TASKS TAB */}
|
| 799 |
+
{selectedView === "tasks" && (
|
| 800 |
+
<div>
|
| 801 |
+
{/* Task Type Filter */}
|
| 802 |
+
<div className="mb-4 overflow-x-auto pb-2">
|
| 803 |
+
<div className="flex space-x-2">
|
| 804 |
+
<button
|
| 805 |
+
className={`px-3 py-1 text-sm font-medium rounded-full whitespace-nowrap ${
|
| 806 |
+
selectedTaskType === "all"
|
| 807 |
+
? "bg-blue-100 text-blue-800"
|
| 808 |
+
: "bg-gray-100 text-gray-800"
|
| 809 |
+
}`}
|
| 810 |
+
onClick={() => setSelectedTaskType("all")}
|
| 811 |
+
>
|
| 812 |
+
All Tasks
|
| 813 |
+
</button>
|
| 814 |
+
{Object.keys(taskCategories || {}).map(category => (
|
| 815 |
+
<button
|
| 816 |
+
key={category}
|
| 817 |
+
className={`px-3 py-1 text-sm font-medium rounded-full whitespace-nowrap ${
|
| 818 |
+
selectedTaskType === category
|
| 819 |
+
? "bg-blue-100 text-blue-800"
|
| 820 |
+
: "bg-gray-100 text-gray-800"
|
| 821 |
+
}`}
|
| 822 |
+
onClick={() => setSelectedTaskType(category)}
|
| 823 |
+
>
|
| 824 |
+
{category.charAt(0).toUpperCase() + category.slice(1)}
|
| 825 |
+
</button>
|
| 826 |
+
))}
|
| 827 |
+
</div>
|
| 828 |
+
</div>
|
| 829 |
+
|
| 830 |
+
{/* Task Comparison Section */}
|
| 831 |
+
<div className="grid grid-cols-1 lg:grid-cols-2 gap-6 mb-6">
|
| 832 |
+
{/* Bar Chart */}
|
| 833 |
+
<div className="border rounded-lg overflow-hidden">
|
| 834 |
+
<div className="px-4 py-2 bg-gray-50 border-b flex justify-between items-center">
|
| 835 |
+
<h3 className="font-semibold">Performance Comparison</h3>
|
| 836 |
+
</div>
|
| 837 |
+
<div className="p-4">
|
| 838 |
+
<div className="h-[450px]">
|
| 839 |
+
<ResponsiveContainer width="100%" height="100%">
|
| 840 |
+
<BarChart
|
| 841 |
+
data={taskComparisonData.slice(0, 10)} // Top 10 for clarity
|
| 842 |
+
layout="vertical"
|
| 843 |
+
margin={{ top: 5, right: 30, left: 150, bottom: 5 }}
|
| 844 |
+
>
|
| 845 |
+
<CartesianGrid strokeDasharray="3 3" />
|
| 846 |
+
<XAxis type="number" domain={[0, 100]} />
|
| 847 |
+
<YAxis
|
| 848 |
+
dataKey="task"
|
| 849 |
+
type="category"
|
| 850 |
+
width={150}
|
| 851 |
+
tick={{ fontSize: 12 }}
|
| 852 |
+
/>
|
| 853 |
+
<Tooltip content={<ComparativeBarTooltip />} />
|
| 854 |
+
<Legend />
|
| 855 |
+
{compareModels.map(modelName => {
|
| 856 |
+
const model = getModelByName(modelName);
|
| 857 |
+
return (
|
| 858 |
+
<Bar
|
| 859 |
+
key={modelName}
|
| 860 |
+
dataKey={modelName}
|
| 861 |
+
name={modelName}
|
| 862 |
+
fill={model?.color || '#999'}
|
| 863 |
+
maxBarSize={20}
|
| 864 |
+
/>
|
| 865 |
+
);
|
| 866 |
+
})}
|
| 867 |
+
</BarChart>
|
| 868 |
+
</ResponsiveContainer>
|
| 869 |
+
</div>
|
| 870 |
+
<div className="text-xs text-gray-500 text-center mt-2">
|
| 871 |
+
Showing top 10 tasks with the largest performance differences
|
| 872 |
+
</div>
|
| 873 |
+
</div>
|
| 874 |
+
</div>
|
| 875 |
+
|
| 876 |
+
{/* Gap Analysis */}
|
| 877 |
+
<div className="border rounded-lg overflow-hidden">
|
| 878 |
+
<div className="px-4 py-2 bg-gray-50 border-b">
|
| 879 |
+
<h3 className="font-semibold">Task Performance Gap</h3>
|
| 880 |
+
</div>
|
| 881 |
+
<div className="p-4">
|
| 882 |
+
<div className="h-[450px]">
|
| 883 |
+
<ResponsiveContainer width="100%" height="100%">
|
| 884 |
+
<ComposedChart
|
| 885 |
+
layout="vertical"
|
| 886 |
+
data={taskComparisonData.slice(0, 10)}
|
| 887 |
+
margin={{ top: 20, right: 30, left: 150, bottom: 20 }}
|
| 888 |
+
>
|
| 889 |
+
<CartesianGrid strokeDasharray="3 3" />
|
| 890 |
+
<XAxis
|
| 891 |
+
type="number"
|
| 892 |
+
domain={[-30, 30]}
|
| 893 |
+
tickFormatter={(value) => value > 0 ? `+${value.toFixed(0)}` : value.toFixed(0)}
|
| 894 |
+
/>
|
| 895 |
+
<YAxis
|
| 896 |
+
dataKey="task"
|
| 897 |
+
type="category"
|
| 898 |
+
width={150}
|
| 899 |
+
tick={{ fontSize: 11 }}
|
| 900 |
+
/>
|
| 901 |
+
<Tooltip
|
| 902 |
+
formatter={(value) => [value.toFixed(1), 'Difference']}
|
| 903 |
+
/>
|
| 904 |
+
<Legend />
|
| 905 |
+
<Bar
|
| 906 |
+
dataKey="difference"
|
| 907 |
+
name={`${compareModels[0]} vs ${compareModels[1]}`}
|
| 908 |
+
barSize={20}
|
| 909 |
+
>
|
| 910 |
+
{taskComparisonData.slice(0, 10).map((entry, index) => (
|
| 911 |
+
<Cell
|
| 912 |
+
key={`cell-${index}`}
|
| 913 |
+
fill={entry.difference > 0 ? getModelByName(compareModels[0])?.color : getModelByName(compareModels[1])?.color}
|
| 914 |
+
/>
|
| 915 |
+
))}
|
| 916 |
+
</Bar>
|
| 917 |
+
<ReferenceLine x={0} stroke="#666" strokeWidth={2} />
|
| 918 |
+
</ComposedChart>
|
| 919 |
+
</ResponsiveContainer>
|
| 920 |
+
</div>
|
| 921 |
+
<div className="text-xs text-gray-500 text-center mt-2">
|
| 922 |
+
Bars to the right indicate {compareModels[0]} is better, to the left indicate {compareModels[1]} is better.
|
| 923 |
+
</div>
|
| 924 |
+
</div>
|
| 925 |
+
</div>
|
| 926 |
+
</div>
|
| 927 |
+
|
| 928 |
+
{/* Task Comparison Table */}
|
| 929 |
+
<div className="border rounded-lg overflow-hidden mb-6">
|
| 930 |
+
<div className="px-4 py-2 bg-gray-50 border-b flex justify-between items-center">
|
| 931 |
+
<h3 className="font-semibold">Task Comparison Details</h3>
|
| 932 |
+
<button
|
| 933 |
+
onClick={() => setShowCommonTasksOnly(!showCommonTasksOnly)}
|
| 934 |
+
className={`px-2 py-1 rounded text-xs ${showCommonTasksOnly ? 'bg-blue-100 text-blue-800' : 'bg-gray-100 text-gray-600'}`}
|
| 935 |
+
>
|
| 936 |
+
{showCommonTasksOnly ? 'Common Tasks Only' : 'All Tasks'}
|
| 937 |
+
</button>
|
| 938 |
+
</div>
|
| 939 |
+
<div className="p-4">
|
| 940 |
+
<div className="overflow-x-auto">
|
| 941 |
+
<table className="min-w-full divide-y divide-gray-200">
|
| 942 |
+
<thead className="bg-gray-50">
|
| 943 |
+
<tr>
|
| 944 |
+
<th className="px-4 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">Task</th>
|
| 945 |
+
<th className="px-4 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">Category</th>
|
| 946 |
+
<th className="px-4 py-2 text-right text-xs font-medium text-gray-500 uppercase tracking-wider">{compareModels[0]}</th>
|
| 947 |
+
<th className="px-4 py-2 text-right text-xs font-medium text-gray-500 uppercase tracking-wider">{compareModels[1]}</th>
|
| 948 |
+
<th className="px-4 py-2 text-center text-xs font-medium text-gray-500 uppercase tracking-wider">Difference</th>
|
| 949 |
+
<th className="px-4 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">Better Model</th>
|
| 950 |
+
</tr>
|
| 951 |
+
</thead>
|
| 952 |
+
<tbody className="bg-white divide-y divide-gray-200">
|
| 953 |
+
{taskComparisonData.slice(0, 15).map((task, idx) => (
|
| 954 |
+
<tr key={task.task} className={idx % 2 === 0 ? 'bg-white' : 'bg-gray-50'}>
|
| 955 |
+
<td className="px-4 py-2 text-sm whitespace-normal">{task.task}</td>
|
| 956 |
+
<td className="px-4 py-2 text-sm">{task.category}</td>
|
| 957 |
+
<td className="px-4 py-2 text-sm text-right">{task[compareModels[0]].toFixed(1)}</td>
|
| 958 |
+
<td className="px-4 py-2 text-sm text-right">{task[compareModels[1]].toFixed(1)}</td>
|
| 959 |
+
<td className="px-4 py-2 text-sm text-center">
|
| 960 |
+
<span className={`font-medium ${getDiffColor(task.difference, "aspect")}`}>
|
| 961 |
+
{task.difference > 0 ? '+' : ''}{task.difference.toFixed(1)}
|
| 962 |
+
</span>
|
| 963 |
+
</td>
|
| 964 |
+
<td className="px-4 py-2 text-sm">
|
| 965 |
+
{task.difference !== 0 && (
|
| 966 |
+
<div className="flex items-center">
|
| 967 |
+
<div
|
| 968 |
+
className="w-3 h-3 rounded-full mr-1"
|
| 969 |
+
style={{ backgroundColor: task.difference > 0
|
| 970 |
+
? getModelByName(compareModels[0])?.color
|
| 971 |
+
: getModelByName(compareModels[1])?.color
|
| 972 |
+
}}
|
| 973 |
+
></div>
|
| 974 |
+
<span>{task.difference > 0 ? compareModels[0] : compareModels[1]}</span>
|
| 975 |
+
</div>
|
| 976 |
+
)}
|
| 977 |
+
{task.difference === 0 && (
|
| 978 |
+
<span className="text-gray-500">Tie</span>
|
| 979 |
+
)}
|
| 980 |
+
</td>
|
| 981 |
+
</tr>
|
| 982 |
+
))}
|
| 983 |
+
</tbody>
|
| 984 |
+
</table>
|
| 985 |
+
</div>
|
| 986 |
+
{taskComparisonData.length > 15 && (
|
| 987 |
+
<div className="text-center mt-3 text-sm text-gray-500">
|
| 988 |
+
Showing 15 of {taskComparisonData.length} tasks. Tasks are sorted by largest difference.
|
| 989 |
+
</div>
|
| 990 |
+
)}
|
| 991 |
+
</div>
|
| 992 |
+
</div>
|
| 993 |
+
</div>
|
| 994 |
+
)}
|
| 995 |
+
|
| 996 |
+
{/* Include implementations for other tabs (facets, aspects, demographics) */}
|
| 997 |
+
|
| 998 |
+
</div>
|
| 999 |
+
);
|
| 1000 |
+
};
|
| 1001 |
+
|
| 1002 |
+
export default HeadToHeadComparison;
|
leaderboard-app/components/LLMComparisonDashboard.jsx
ADDED
|
@@ -0,0 +1,688 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"use client";
|
| 2 |
+
|
| 3 |
+
import React, { useState, useMemo } from "react";
|
| 4 |
+
import { getScoreBadgeColor } from "../lib/utils";
|
| 5 |
+
import TaskDemographicAnalysis from "./TaskDemographicAnalysis";
|
| 6 |
+
import MetricsBreakdown from "./MetricsBreakdown";
|
| 7 |
+
import HeadToHeadComparison from "./HeadToHeadComparison";
|
| 8 |
+
|
| 9 |
+
// Reusable component for displaying scores with standard deviation
|
| 10 |
+
const ScoreWithStdDev = ({ score, stdDev, colorClass }) => {
|
| 11 |
+
return (
|
| 12 |
+
<span
|
| 13 |
+
className={`px-2 py-1 inline-flex text-xs font-semibold rounded-full ${colorClass}`}
|
| 14 |
+
>
|
| 15 |
+
{score.toFixed(2)} ± {stdDev.toFixed(2)}
|
| 16 |
+
</span>
|
| 17 |
+
);
|
| 18 |
+
};
|
| 19 |
+
|
| 20 |
+
const formatFacetName = (facet) => {
|
| 21 |
+
if (!facet) return "Unknown"; // Handle null or undefined facet
|
| 22 |
+
|
| 23 |
+
const facetMap = {
|
| 24 |
+
helpfulness: "Helpfulness",
|
| 25 |
+
communication: "Communication",
|
| 26 |
+
insightful: "Insightfulness",
|
| 27 |
+
adaptiveness: "Adaptiveness",
|
| 28 |
+
trustworthiness: "Trustworthiness",
|
| 29 |
+
personality: "Personality",
|
| 30 |
+
background_and_culture: "Cultural Awareness",
|
| 31 |
+
};
|
| 32 |
+
|
| 33 |
+
return (
|
| 34 |
+
facetMap[facet] ||
|
| 35 |
+
facet.replace(/_/g, " ").replace(/\b\w/g, (l) => l.toUpperCase())
|
| 36 |
+
);
|
| 37 |
+
};
|
| 38 |
+
|
| 39 |
+
const LLMComparisonDashboard = ({ data }) => {
|
| 40 |
+
const [activeTab, setActiveTab] = useState("overview");
|
| 41 |
+
const [sortConfig, setSortConfig] = useState({
|
| 42 |
+
key: "overall_score",
|
| 43 |
+
direction: "descending",
|
| 44 |
+
});
|
| 45 |
+
|
| 46 |
+
const {
|
| 47 |
+
models,
|
| 48 |
+
radarData,
|
| 49 |
+
bestModelPerCategory,
|
| 50 |
+
taskCategories,
|
| 51 |
+
keyAspectsByTask
|
| 52 |
+
} = data || {
|
| 53 |
+
models: [],
|
| 54 |
+
radarData: [],
|
| 55 |
+
taskData: [],
|
| 56 |
+
bestModelPerCategory: {},
|
| 57 |
+
bestModelPerFacet: {},
|
| 58 |
+
taskCategories: {},
|
| 59 |
+
facets: {},
|
| 60 |
+
demographicSummary: {},
|
| 61 |
+
fairnessMetrics: {},
|
| 62 |
+
demographicOptions: {},
|
| 63 |
+
keyAspectsByTask: {}
|
| 64 |
+
};
|
| 65 |
+
|
| 66 |
+
// Request sort function
|
| 67 |
+
const requestSort = (key) => {
|
| 68 |
+
let direction = "descending";
|
| 69 |
+
if (sortConfig.key === key && sortConfig.direction === "descending") {
|
| 70 |
+
direction = "ascending";
|
| 71 |
+
}
|
| 72 |
+
setSortConfig({ key, direction });
|
| 73 |
+
};
|
| 74 |
+
|
| 75 |
+
// Get sorted models
|
| 76 |
+
const sortedModels = useMemo(() => {
|
| 77 |
+
let sortableItems = [...models];
|
| 78 |
+
if (sortConfig.key !== null) {
|
| 79 |
+
sortableItems.sort((a, b) => {
|
| 80 |
+
let aValue, bValue;
|
| 81 |
+
|
| 82 |
+
// Handle nested properties for facet scores
|
| 83 |
+
if (sortConfig.key.includes(".")) {
|
| 84 |
+
const [group, metric] = sortConfig.key.split(".");
|
| 85 |
+
if (group === "facet_scores") {
|
| 86 |
+
aValue = a.facet_scores[metric];
|
| 87 |
+
bValue = b.facet_scores[metric];
|
| 88 |
+
} else {
|
| 89 |
+
aValue = a[sortConfig.key];
|
| 90 |
+
bValue = b[sortConfig.key];
|
| 91 |
+
}
|
| 92 |
+
} else if (sortConfig.key === "model") {
|
| 93 |
+
aValue = a.model;
|
| 94 |
+
bValue = b.model;
|
| 95 |
+
} else {
|
| 96 |
+
// For other properties directly on the model object
|
| 97 |
+
aValue = a[sortConfig.key];
|
| 98 |
+
bValue = b[sortConfig.key];
|
| 99 |
+
}
|
| 100 |
+
|
| 101 |
+
if (aValue < bValue) {
|
| 102 |
+
return sortConfig.direction === "ascending" ? -1 : 1;
|
| 103 |
+
}
|
| 104 |
+
if (aValue > bValue) {
|
| 105 |
+
return sortConfig.direction === "ascending" ? 1 : -1;
|
| 106 |
+
}
|
| 107 |
+
return 0;
|
| 108 |
+
});
|
| 109 |
+
}
|
| 110 |
+
return sortableItems;
|
| 111 |
+
}, [models, sortConfig]);
|
| 112 |
+
|
| 113 |
+
// Custom tooltip for the radar chart
|
| 114 |
+
const CustomTooltip = ({ active, payload }) => {
|
| 115 |
+
if (active && payload && payload.length) {
|
| 116 |
+
return (
|
| 117 |
+
<div className="p-2 bg-white border border-gray-200 rounded shadow-sm">
|
| 118 |
+
{payload.map((entry, index) => {
|
| 119 |
+
// Skip standard deviation entries
|
| 120 |
+
if (entry.name.includes("_std")) return null;
|
| 121 |
+
|
| 122 |
+
const baseModelName = entry.name;
|
| 123 |
+
const stdEntry = payload.find(
|
| 124 |
+
(p) => p.name === `${baseModelName}_std`
|
| 125 |
+
);
|
| 126 |
+
const stdValue = stdEntry ? stdEntry.value : 0;
|
| 127 |
+
|
| 128 |
+
return (
|
| 129 |
+
<div key={index} className="flex items-center">
|
| 130 |
+
<div
|
| 131 |
+
className="w-3 h-3 mr-1"
|
| 132 |
+
style={{ backgroundColor: entry.color }}
|
| 133 |
+
></div>
|
| 134 |
+
<span className="text-xs">
|
| 135 |
+
{entry.name}: {entry.value.toFixed(2)} ± {stdValue.toFixed(2)}
|
| 136 |
+
</span>
|
| 137 |
+
</div>
|
| 138 |
+
);
|
| 139 |
+
})}
|
| 140 |
+
</div>
|
| 141 |
+
);
|
| 142 |
+
}
|
| 143 |
+
return null;
|
| 144 |
+
};
|
| 145 |
+
|
| 146 |
+
return (
|
| 147 |
+
<div className="max-w-7xl mx-auto p-4 bg-white">
|
| 148 |
+
<h1 className="text-3xl font-bold text-center mb-2">
|
| 149 |
+
LLM Performance: The Human Perspective
|
| 150 |
+
</h1>
|
| 151 |
+
<p className="text-center mb-6 text-gray-600 max-w-4xl mx-auto">
|
| 152 |
+
Evaluations of LLMs performing everyday tasks, metrics focus on both
|
| 153 |
+
technical quality and user experience factors.
|
| 154 |
+
</p>
|
| 155 |
+
|
| 156 |
+
{/* Main navigation tabs - Updated structure */}
|
| 157 |
+
<div className="flex flex-wrap mb-6 border-b">
|
| 158 |
+
<button
|
| 159 |
+
className={`px-4 py-2 font-medium ${
|
| 160 |
+
activeTab === "overview"
|
| 161 |
+
? "text-blue-600 border-b-2 border-blue-600"
|
| 162 |
+
: "text-gray-500"
|
| 163 |
+
}`}
|
| 164 |
+
onClick={() => setActiveTab("overview")}
|
| 165 |
+
>
|
| 166 |
+
Overview
|
| 167 |
+
</button>
|
| 168 |
+
<button
|
| 169 |
+
className={`px-4 py-2 font-medium ${
|
| 170 |
+
activeTab === "task-demographics"
|
| 171 |
+
? "text-blue-600 border-b-2 border-blue-600"
|
| 172 |
+
: "text-gray-500"
|
| 173 |
+
}`}
|
| 174 |
+
onClick={() => setActiveTab("task-demographics")}
|
| 175 |
+
>
|
| 176 |
+
Task & Demographic Analysis
|
| 177 |
+
</button>
|
| 178 |
+
<button
|
| 179 |
+
className={`px-4 py-2 font-medium ${
|
| 180 |
+
activeTab === "facets"
|
| 181 |
+
? "text-blue-600 border-b-2 border-blue-600"
|
| 182 |
+
: "text-gray-500"
|
| 183 |
+
}`}
|
| 184 |
+
onClick={() => setActiveTab("facets")}
|
| 185 |
+
>
|
| 186 |
+
Metrics Breakdown
|
| 187 |
+
</button>
|
| 188 |
+
{/* <button
|
| 189 |
+
className={`px-4 py-2 font-medium ${
|
| 190 |
+
activeTab === "headtohead"
|
| 191 |
+
? "text-blue-600 border-b-2 border-blue-600"
|
| 192 |
+
: "text-gray-500"
|
| 193 |
+
}`}
|
| 194 |
+
onClick={() => setActiveTab("headtohead")}
|
| 195 |
+
>
|
| 196 |
+
Head-to-Head Comparison
|
| 197 |
+
</button> */}
|
| 198 |
+
</div>
|
| 199 |
+
|
| 200 |
+
{/* Overview Tab */}
|
| 201 |
+
{activeTab === "overview" && (
|
| 202 |
+
<div>
|
| 203 |
+
{/* Overall Rankings Card - Simplified */}
|
| 204 |
+
<div className="mb-6 border rounded-lg overflow-hidden">
|
| 205 |
+
<div className="px-4 py-2 bg-gray-50 border-b">
|
| 206 |
+
<h2 className="text-xl font-semibold">Overall Model Rankings</h2>
|
| 207 |
+
</div>
|
| 208 |
+
<div className="p-4">
|
| 209 |
+
<div className="overflow-x-auto">
|
| 210 |
+
<table className="w-full table-fixed divide-y divide-gray-200">
|
| 211 |
+
<thead>
|
| 212 |
+
<tr className="bg-gray-50">
|
| 213 |
+
<th className="px-4 py-2 text-left text-sm font-medium text-gray-500 w-10">
|
| 214 |
+
Rank
|
| 215 |
+
</th>
|
| 216 |
+
<th
|
| 217 |
+
className="px-4 py-2 text-left text-sm font-medium text-gray-500 w-52 cursor-pointer group"
|
| 218 |
+
onClick={() => requestSort("model")}
|
| 219 |
+
>
|
| 220 |
+
<div className="flex items-center">
|
| 221 |
+
Model
|
| 222 |
+
{sortConfig.key === "model" ? (
|
| 223 |
+
<span className="ml-1">
|
| 224 |
+
{sortConfig.direction === "ascending" ? "↑" : "↓"}
|
| 225 |
+
</span>
|
| 226 |
+
) : (
|
| 227 |
+
<span className="ml-1 text-gray-300 group-hover:text-gray-500">
|
| 228 |
+
⇅
|
| 229 |
+
</span>
|
| 230 |
+
)}
|
| 231 |
+
</div>
|
| 232 |
+
</th>
|
| 233 |
+
<th
|
| 234 |
+
className="px-4 py-2 text-left text-sm font-medium text-gray-500 w-50 cursor-pointer group"
|
| 235 |
+
onClick={() => requestSort("overall_score")}
|
| 236 |
+
>
|
| 237 |
+
<div className="flex items-center">
|
| 238 |
+
Overall Score
|
| 239 |
+
{sortConfig.key === "overall_score" ? (
|
| 240 |
+
<span className="ml-1">
|
| 241 |
+
{sortConfig.direction === "ascending" ? "↑" : "↓"}
|
| 242 |
+
</span>
|
| 243 |
+
) : (
|
| 244 |
+
<span className="ml-1 text-gray-300 group-hover:text-gray-500">
|
| 245 |
+
⇅
|
| 246 |
+
</span>
|
| 247 |
+
)}
|
| 248 |
+
</div>
|
| 249 |
+
</th>
|
| 250 |
+
<th
|
| 251 |
+
className="px-4 py-2 text-left text-sm font-medium text-gray-500 w-42 cursor-pointer group"
|
| 252 |
+
onClick={() => requestSort("repeat_usage_pct")}
|
| 253 |
+
>
|
| 254 |
+
<div className="flex items-center">
|
| 255 |
+
Would Use Again
|
| 256 |
+
{sortConfig.key === "repeat_usage_pct" ? (
|
| 257 |
+
<span className="ml-1">
|
| 258 |
+
{sortConfig.direction === "ascending" ? "↑" : "↓"}
|
| 259 |
+
</span>
|
| 260 |
+
) : (
|
| 261 |
+
<span className="ml-1 text-gray-300 group-hover:text-gray-500">
|
| 262 |
+
⇅
|
| 263 |
+
</span>
|
| 264 |
+
)}
|
| 265 |
+
</div>
|
| 266 |
+
</th>
|
| 267 |
+
<th className="px-4 py-2 text-left text-sm font-medium text-gray-500 w-54">
|
| 268 |
+
Top Strengths
|
| 269 |
+
</th>
|
| 270 |
+
</tr>
|
| 271 |
+
</thead>
|
| 272 |
+
<tbody className="divide-y divide-gray-200">
|
| 273 |
+
{sortedModels.map((model, index) => (
|
| 274 |
+
<tr
|
| 275 |
+
key={model.model}
|
| 276 |
+
className={index % 2 === 0 ? "bg-white" : "bg-gray-50"}
|
| 277 |
+
>
|
| 278 |
+
<td className="px-4 py-3 text-sm font-medium text-gray-900 w-10">
|
| 279 |
+
{index + 1}
|
| 280 |
+
</td>
|
| 281 |
+
<td className="px-4 py-3 w-52">
|
| 282 |
+
<div className="flex items-center">
|
| 283 |
+
<div
|
| 284 |
+
className="w-3 h-3 rounded-full mr-2"
|
| 285 |
+
style={{ backgroundColor: model.color }}
|
| 286 |
+
></div>
|
| 287 |
+
<span className="text-sm font-medium text-gray-900">
|
| 288 |
+
{model.model}
|
| 289 |
+
</span>
|
| 290 |
+
</div>
|
| 291 |
+
</td>
|
| 292 |
+
<td className="px-4 py-3 min-w-[200px] w-64">
|
| 293 |
+
<ScoreWithStdDev
|
| 294 |
+
score={model.overall_score}
|
| 295 |
+
stdDev={model.overall_std}
|
| 296 |
+
colorClass={getScoreBadgeColor(
|
| 297 |
+
model.overall_score,
|
| 298 |
+
0,
|
| 299 |
+
100
|
| 300 |
+
)}
|
| 301 |
+
/>
|
| 302 |
+
</td>
|
| 303 |
+
<td className="px-4 py-3 whitespace-nowrap w-32">
|
| 304 |
+
<span
|
| 305 |
+
className={`px-2 py-1 inline-flex text-xs font-semibold rounded-full ${
|
| 306 |
+
model.repeat_usage_pct > 80
|
| 307 |
+
? "bg-green-100 text-green-800"
|
| 308 |
+
: model.repeat_usage_pct > 60
|
| 309 |
+
? "bg-blue-100 text-blue-800"
|
| 310 |
+
: "bg-yellow-100 text-yellow-800"
|
| 311 |
+
}`}
|
| 312 |
+
>
|
| 313 |
+
{model.repeat_usage_pct.toFixed(1)}%
|
| 314 |
+
{/* ±{" "} {model.repeat_usage_pct_std.toFixed(1)} */}
|
| 315 |
+
</span>
|
| 316 |
+
</td>
|
| 317 |
+
<td className="px-4 py-3 text-sm text-gray-500 w-52">
|
| 318 |
+
{model.top_strengths && model.top_strengths.length > 0
|
| 319 |
+
? model.top_strengths
|
| 320 |
+
.slice(0, 3)
|
| 321 |
+
.map((strength) => formatFacetName(strength))
|
| 322 |
+
.join(", ")
|
| 323 |
+
: "N/A"}
|
| 324 |
+
</td>
|
| 325 |
+
</tr>
|
| 326 |
+
))}
|
| 327 |
+
</tbody>
|
| 328 |
+
</table>
|
| 329 |
+
</div>
|
| 330 |
+
</div>
|
| 331 |
+
</div>
|
| 332 |
+
|
| 333 |
+
{/* Enhanced Top Performers Cards */}
|
| 334 |
+
{Object.keys(bestModelPerCategory).length > 0 && (
|
| 335 |
+
<div>
|
| 336 |
+
<h3 className="font-semibold text-xl mb-4">
|
| 337 |
+
Best Models by Task Category
|
| 338 |
+
</h3>
|
| 339 |
+
<div className="grid grid-cols-1 md:grid-cols-3 gap-6 mb-6">
|
| 340 |
+
{/* Creative Tasks Card - Enhanced */}
|
| 341 |
+
<div className="border rounded-lg overflow-hidden">
|
| 342 |
+
<div className="px-4 py-2 bg-gray-50 border-b flex items-center">
|
| 343 |
+
<h3 className="font-semibold">Best for Creative Tasks</h3>
|
| 344 |
+
<div
|
| 345 |
+
className="ml-2 w-2 h-2 rounded-full"
|
| 346 |
+
style={{
|
| 347 |
+
backgroundColor:
|
| 348 |
+
bestModelPerCategory.creative?.color || "#e5e7eb",
|
| 349 |
+
}}
|
| 350 |
+
></div>
|
| 351 |
+
</div>
|
| 352 |
+
<div className="p-4">
|
| 353 |
+
<div className="flex items-center mb-4">
|
| 354 |
+
<div
|
| 355 |
+
className="p-2 rounded-full"
|
| 356 |
+
style={{
|
| 357 |
+
backgroundColor:
|
| 358 |
+
bestModelPerCategory.creative?.color + "20" ||
|
| 359 |
+
"#e5e7eb",
|
| 360 |
+
}}
|
| 361 |
+
>
|
| 362 |
+
<svg
|
| 363 |
+
xmlns="http://www.w3.org/2000/svg"
|
| 364 |
+
className="h-8 w-8"
|
| 365 |
+
style={{
|
| 366 |
+
color:
|
| 367 |
+
bestModelPerCategory.creative?.color || "#6b7280",
|
| 368 |
+
}}
|
| 369 |
+
fill="none"
|
| 370 |
+
viewBox="0 0 24 24"
|
| 371 |
+
stroke="currentColor"
|
| 372 |
+
>
|
| 373 |
+
<path
|
| 374 |
+
strokeLinecap="round"
|
| 375 |
+
strokeLinejoin="round"
|
| 376 |
+
strokeWidth={2}
|
| 377 |
+
d="M9.663 17h4.673M12 3v1m6.364 1.636l-.707.707M21 12h-1M4 12H3m3.343-5.657l-.707-.707m2.828 9.9a5 5 0 117.072 0l-.548.547A3.374 3.374 0 0014 18.469V19a2 2 0 11-4 0v-.531c0-.895-.356-1.754-.988-2.386l-.548-.547z"
|
| 378 |
+
/>
|
| 379 |
+
</svg>
|
| 380 |
+
</div>
|
| 381 |
+
<div className="ml-4">
|
| 382 |
+
<h4 className="text-lg font-semibold">
|
| 383 |
+
{bestModelPerCategory.creative?.model || "N/A"}
|
| 384 |
+
</h4>
|
| 385 |
+
<p className="text-sm text-gray-600">
|
| 386 |
+
Score:{" "}
|
| 387 |
+
{bestModelPerCategory.creative?.score.toFixed(2) ||
|
| 388 |
+
"N/A"}
|
| 389 |
+
{bestModelPerCategory.creative?.std &&
|
| 390 |
+
` ± ${bestModelPerCategory.creative.std.toFixed(
|
| 391 |
+
2
|
| 392 |
+
)}`}
|
| 393 |
+
</p>
|
| 394 |
+
</div>
|
| 395 |
+
</div>
|
| 396 |
+
|
| 397 |
+
{/* Key aspects/facets visualization */}
|
| 398 |
+
<div className="mb-4">
|
| 399 |
+
<h5 className="text-sm font-medium mb-2">
|
| 400 |
+
Key Aspects for Creative Tasks
|
| 401 |
+
</h5>
|
| 402 |
+
<div className="space-y-2">
|
| 403 |
+
{(keyAspectsByTask.by_category.creative || []).map(
|
| 404 |
+
(aspectInfo) => {
|
| 405 |
+
const aspect = aspectInfo.raw_aspect;
|
| 406 |
+
const score = aspectInfo.score;
|
| 407 |
+
|
| 408 |
+
return (
|
| 409 |
+
<div key={aspect} className="text-sm">
|
| 410 |
+
<div className="flex justify-between mb-1">
|
| 411 |
+
<span>{aspectInfo.aspect}</span>
|
| 412 |
+
<span className="font-medium">
|
| 413 |
+
{score.toFixed(1)}
|
| 414 |
+
</span>
|
| 415 |
+
</div>
|
| 416 |
+
<div className="w-full bg-gray-200 rounded-full h-2">
|
| 417 |
+
<div
|
| 418 |
+
className="h-2 rounded-full"
|
| 419 |
+
style={{
|
| 420 |
+
width: `${score}%`,
|
| 421 |
+
backgroundColor:
|
| 422 |
+
bestModelPerCategory.creative?.color ||
|
| 423 |
+
"#6b7280",
|
| 424 |
+
}}
|
| 425 |
+
></div>
|
| 426 |
+
</div>
|
| 427 |
+
</div>
|
| 428 |
+
);
|
| 429 |
+
}
|
| 430 |
+
)}
|
| 431 |
+
</div>
|
| 432 |
+
</div>
|
| 433 |
+
|
| 434 |
+
<p className="text-sm text-gray-700">
|
| 435 |
+
Excels at creative tasks like generating ideas and
|
| 436 |
+
creating travel itineraries.
|
| 437 |
+
</p>
|
| 438 |
+
<div className="mt-3 text-xs text-gray-500">
|
| 439 |
+
<div>Tasks in this category:</div>
|
| 440 |
+
<ul className="list-disc ml-4 mt-1">
|
| 441 |
+
{taskCategories.creative?.map((task) => (
|
| 442 |
+
<li key={task}>{task}</li>
|
| 443 |
+
)) || <li>No data available</li>}
|
| 444 |
+
</ul>
|
| 445 |
+
</div>
|
| 446 |
+
</div>
|
| 447 |
+
</div>
|
| 448 |
+
|
| 449 |
+
{/* Practical Tasks Card - Enhanced */}
|
| 450 |
+
<div className="border rounded-lg overflow-hidden">
|
| 451 |
+
<div className="px-4 py-2 bg-gray-50 border-b flex items-center">
|
| 452 |
+
<h3 className="font-semibold">Best for Practical Tasks</h3>
|
| 453 |
+
<div
|
| 454 |
+
className="ml-2 w-2 h-2 rounded-full"
|
| 455 |
+
style={{
|
| 456 |
+
backgroundColor:
|
| 457 |
+
bestModelPerCategory.practical?.color || "#e5e7eb",
|
| 458 |
+
}}
|
| 459 |
+
></div>
|
| 460 |
+
</div>
|
| 461 |
+
<div className="p-4">
|
| 462 |
+
<div className="flex items-center mb-4">
|
| 463 |
+
<div
|
| 464 |
+
className="p-2 rounded-full"
|
| 465 |
+
style={{
|
| 466 |
+
backgroundColor:
|
| 467 |
+
bestModelPerCategory.practical?.color + "20" ||
|
| 468 |
+
"#e5e7eb",
|
| 469 |
+
}}
|
| 470 |
+
>
|
| 471 |
+
<svg
|
| 472 |
+
xmlns="http://www.w3.org/2000/svg"
|
| 473 |
+
className="h-8 w-8"
|
| 474 |
+
style={{
|
| 475 |
+
color:
|
| 476 |
+
bestModelPerCategory.practical?.color ||
|
| 477 |
+
"#6b7280",
|
| 478 |
+
}}
|
| 479 |
+
fill="none"
|
| 480 |
+
viewBox="0 0 24 24"
|
| 481 |
+
stroke="currentColor"
|
| 482 |
+
>
|
| 483 |
+
<path
|
| 484 |
+
strokeLinecap="round"
|
| 485 |
+
strokeLinejoin="round"
|
| 486 |
+
strokeWidth={2}
|
| 487 |
+
d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"
|
| 488 |
+
/>
|
| 489 |
+
</svg>
|
| 490 |
+
</div>
|
| 491 |
+
<div className="ml-4">
|
| 492 |
+
<h4 className="text-lg font-semibold">
|
| 493 |
+
{bestModelPerCategory.practical?.model || "N/A"}
|
| 494 |
+
</h4>
|
| 495 |
+
<p className="text-sm text-gray-600">
|
| 496 |
+
Score:{" "}
|
| 497 |
+
{bestModelPerCategory.practical?.score.toFixed(2) ||
|
| 498 |
+
"N/A"}
|
| 499 |
+
{bestModelPerCategory.practical?.std &&
|
| 500 |
+
` ± ${bestModelPerCategory.practical.std.toFixed(
|
| 501 |
+
2
|
| 502 |
+
)}`}
|
| 503 |
+
</p>
|
| 504 |
+
</div>
|
| 505 |
+
</div>
|
| 506 |
+
|
| 507 |
+
{/* Key facets visualization */}
|
| 508 |
+
<div className="mb-4">
|
| 509 |
+
<h5 className="text-sm font-medium mb-2">
|
| 510 |
+
Key Aspects for Practical Tasks
|
| 511 |
+
</h5>
|
| 512 |
+
<div className="space-y-2">
|
| 513 |
+
{keyAspectsByTask.by_category.practical.map(
|
| 514 |
+
(aspectInfo) => {
|
| 515 |
+
const aspect = aspectInfo.raw_aspect;
|
| 516 |
+
const score = aspectInfo.score;
|
| 517 |
+
|
| 518 |
+
return (
|
| 519 |
+
<div key={aspect} className="text-sm">
|
| 520 |
+
<div className="flex justify-between mb-1">
|
| 521 |
+
<span>{aspectInfo.aspect}</span>
|
| 522 |
+
<span className="font-medium">
|
| 523 |
+
{score.toFixed(1)}
|
| 524 |
+
</span>
|
| 525 |
+
</div>
|
| 526 |
+
<div className="w-full bg-gray-200 rounded-full h-2">
|
| 527 |
+
<div
|
| 528 |
+
className="h-2 rounded-full"
|
| 529 |
+
style={{
|
| 530 |
+
width: `${score}%`,
|
| 531 |
+
backgroundColor:
|
| 532 |
+
bestModelPerCategory.practical?.color ||
|
| 533 |
+
"#6b7280",
|
| 534 |
+
}}
|
| 535 |
+
></div>
|
| 536 |
+
</div>
|
| 537 |
+
</div>
|
| 538 |
+
);
|
| 539 |
+
}
|
| 540 |
+
)}
|
| 541 |
+
</div>
|
| 542 |
+
</div>
|
| 543 |
+
|
| 544 |
+
<p className="text-sm text-gray-700">
|
| 545 |
+
Best performance on practical tasks like creating a meal plan or following up on a job application.
|
| 546 |
+
</p>
|
| 547 |
+
<div className="mt-3 text-xs text-gray-500">
|
| 548 |
+
<div>Tasks in this category:</div>
|
| 549 |
+
<ul className="list-disc ml-4 mt-1">
|
| 550 |
+
{taskCategories.practical?.map((task) => (
|
| 551 |
+
<li key={task}>{task}</li>
|
| 552 |
+
)) || <li>No data available</li>}
|
| 553 |
+
</ul>
|
| 554 |
+
</div>
|
| 555 |
+
</div>
|
| 556 |
+
</div>
|
| 557 |
+
|
| 558 |
+
{/* Meal Planning Card - Enhanced */}
|
| 559 |
+
<div className="border rounded-lg overflow-hidden">
|
| 560 |
+
<div className="px-4 py-2 bg-gray-50 border-b flex items-center">
|
| 561 |
+
<h3 className="font-semibold">Best for Analytical Tasks</h3>
|
| 562 |
+
<div
|
| 563 |
+
className="ml-2 w-2 h-2 rounded-full"
|
| 564 |
+
style={{
|
| 565 |
+
backgroundColor:
|
| 566 |
+
bestModelPerCategory.analytical?.color || "#e5e7eb",
|
| 567 |
+
}}
|
| 568 |
+
></div>
|
| 569 |
+
</div>
|
| 570 |
+
<div className="p-4">
|
| 571 |
+
<div className="flex items-center mb-4">
|
| 572 |
+
<div
|
| 573 |
+
className="p-2 rounded-full"
|
| 574 |
+
style={{
|
| 575 |
+
backgroundColor:
|
| 576 |
+
bestModelPerCategory.analytical?.color + "20" ||
|
| 577 |
+
"#e5e7eb",
|
| 578 |
+
}}
|
| 579 |
+
>
|
| 580 |
+
<svg
|
| 581 |
+
xmlns="http://www.w3.org/2000/svg"
|
| 582 |
+
className="h-8 w-8"
|
| 583 |
+
style={{
|
| 584 |
+
color:
|
| 585 |
+
bestModelPerCategory.analytical?.color ||
|
| 586 |
+
"#6b7280",
|
| 587 |
+
}}
|
| 588 |
+
fill="none"
|
| 589 |
+
viewBox="0 0 24 24"
|
| 590 |
+
stroke="currentColor"
|
| 591 |
+
>
|
| 592 |
+
<path
|
| 593 |
+
strokeLinecap="round"
|
| 594 |
+
strokeLinejoin="round"
|
| 595 |
+
strokeWidth={2}
|
| 596 |
+
d="M12 6.253v13m0-13C10.832 5.477 9.246 5 7.5 5S4.168 5.477 3 6.253v13C4.168 18.477 5.754 18 7.5 18s3.332.477 4.5 1.253m0-13C13.168 5.477 14.754 5 16.5 5c1.747 0 3.332.477 4.5 1.253v13C19.832 18.477 18.247 18 16.5 18c-1.746 0-3.332.477-4.5 1.253"
|
| 597 |
+
/>
|
| 598 |
+
</svg>
|
| 599 |
+
</div>
|
| 600 |
+
<div className="ml-4">
|
| 601 |
+
<h4 className="text-lg font-semibold">
|
| 602 |
+
{bestModelPerCategory.analytical?.model || "N/A"}
|
| 603 |
+
</h4>
|
| 604 |
+
<p className="text-sm text-gray-600">
|
| 605 |
+
Score:{" "}
|
| 606 |
+
{bestModelPerCategory.analytical?.score.toFixed(2) ||
|
| 607 |
+
"N/A"}
|
| 608 |
+
{bestModelPerCategory.analytical?.std &&
|
| 609 |
+
` ± ${bestModelPerCategory.analytical.std.toFixed(
|
| 610 |
+
2
|
| 611 |
+
)}`}
|
| 612 |
+
</p>
|
| 613 |
+
</div>
|
| 614 |
+
</div>
|
| 615 |
+
|
| 616 |
+
{/* Key facets/aspects visualization */}
|
| 617 |
+
<div className="mb-4">
|
| 618 |
+
<h5 className="text-sm font-medium mb-2">
|
| 619 |
+
Key Aspects for Analytical Tasks
|
| 620 |
+
</h5>
|
| 621 |
+
<div className="space-y-2">
|
| 622 |
+
{keyAspectsByTask.by_category.analytical.map(
|
| 623 |
+
(aspectInfo) => {
|
| 624 |
+
const aspect = aspectInfo.raw_aspect;
|
| 625 |
+
const score = aspectInfo.score;
|
| 626 |
+
|
| 627 |
+
return (
|
| 628 |
+
<div key={aspect} className="text-sm">
|
| 629 |
+
<div className="flex justify-between mb-1">
|
| 630 |
+
<span>{aspectInfo.aspect}</span>
|
| 631 |
+
<span className="font-medium">
|
| 632 |
+
{score.toFixed(1)}
|
| 633 |
+
</span>
|
| 634 |
+
</div>
|
| 635 |
+
<div className="w-full bg-gray-200 rounded-full h-2">
|
| 636 |
+
<div
|
| 637 |
+
className="h-2 rounded-full"
|
| 638 |
+
style={{
|
| 639 |
+
width: `${score}%`,
|
| 640 |
+
backgroundColor:
|
| 641 |
+
bestModelPerCategory.analytical
|
| 642 |
+
?.color || "#6b7280",
|
| 643 |
+
}}
|
| 644 |
+
></div>
|
| 645 |
+
</div>
|
| 646 |
+
</div>
|
| 647 |
+
);
|
| 648 |
+
}
|
| 649 |
+
)}
|
| 650 |
+
</div>
|
| 651 |
+
</div>
|
| 652 |
+
|
| 653 |
+
<p className="text-sm text-gray-700">
|
| 654 |
+
Exceptional at analytical tasks like breaking down complex topics or helping you decide between options.
|
| 655 |
+
</p>
|
| 656 |
+
<div className="mt-3 text-xs text-gray-500">
|
| 657 |
+
<div>Tasks in this category:</div>
|
| 658 |
+
<ul className="list-disc ml-4 mt-1">
|
| 659 |
+
{taskCategories.analytical?.map((task) => (
|
| 660 |
+
<li key={task}>{task}</li>
|
| 661 |
+
)) || <li>No data available</li>}
|
| 662 |
+
</ul>
|
| 663 |
+
</div>
|
| 664 |
+
</div>
|
| 665 |
+
</div>
|
| 666 |
+
</div>
|
| 667 |
+
</div>
|
| 668 |
+
)}
|
| 669 |
+
|
| 670 |
+
|
| 671 |
+
</div>
|
| 672 |
+
)}
|
| 673 |
+
|
| 674 |
+
{/* Task & Demographic Analysis Tab */}
|
| 675 |
+
{activeTab === "task-demographics" && data && (
|
| 676 |
+
<TaskDemographicAnalysis data={data} />
|
| 677 |
+
)}
|
| 678 |
+
|
| 679 |
+
{/* Facet & Aspect Breakdown Tab */}
|
| 680 |
+
{activeTab === "facets" && data && <MetricsBreakdown data={data} />}
|
| 681 |
+
|
| 682 |
+
{/* Head-to-Head Comparison Tab */}
|
| 683 |
+
{/* {activeTab === "headtohead" && <HeadToHeadComparison data={data} />} */}
|
| 684 |
+
</div>
|
| 685 |
+
);
|
| 686 |
+
};
|
| 687 |
+
|
| 688 |
+
export default LLMComparisonDashboard;
|
leaderboard-app/components/MetricsBreakdown.jsx
ADDED
|
@@ -0,0 +1,638 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"use client";
|
| 2 |
+
|
| 3 |
+
import React, { useState, useEffect } from "react";
|
| 4 |
+
import {
|
| 5 |
+
BarChart,
|
| 6 |
+
Bar,
|
| 7 |
+
XAxis,
|
| 8 |
+
YAxis,
|
| 9 |
+
CartesianGrid,
|
| 10 |
+
Tooltip,
|
| 11 |
+
Legend,
|
| 12 |
+
ResponsiveContainer,
|
| 13 |
+
RadarChart,
|
| 14 |
+
PolarGrid,
|
| 15 |
+
PolarAngleAxis,
|
| 16 |
+
PolarRadiusAxis,
|
| 17 |
+
Radar
|
| 18 |
+
} from "recharts";
|
| 19 |
+
|
| 20 |
+
// Utility functions for formatting facet and aspect names
|
| 21 |
+
const formatFacetName = (facet) => {
|
| 22 |
+
const facetMap = {
|
| 23 |
+
"helpfulness": "Helpfulness",
|
| 24 |
+
"communication": "Communication",
|
| 25 |
+
"insightful": "Insightfulness",
|
| 26 |
+
"adaptiveness": "Adaptiveness",
|
| 27 |
+
"trustworthiness": "Trustworthiness",
|
| 28 |
+
"personality": "Personality",
|
| 29 |
+
"background_and_culture": "Cultural Awareness"
|
| 30 |
+
};
|
| 31 |
+
|
| 32 |
+
return facetMap[facet] || (facet ? facet.replace(/_/g, ' ').replace(/\b\w/g, l => l.toUpperCase()) : facet);
|
| 33 |
+
};
|
| 34 |
+
|
| 35 |
+
const formatAspectName = (aspect) => {
|
| 36 |
+
const aspectMap = {
|
| 37 |
+
"effectiveness": "Effectiveness",
|
| 38 |
+
"comprehensiveness": "Comprehensiveness",
|
| 39 |
+
"usefulness": "Usefulness",
|
| 40 |
+
"tone_and_language_style": "Tone & Language",
|
| 41 |
+
"naturalness": "Naturalness",
|
| 42 |
+
"detail_and_technical_language": "Detail & Technical",
|
| 43 |
+
"accuracy": "Accuracy",
|
| 44 |
+
"sharpness": "Sharpness",
|
| 45 |
+
"intuitive": "Intuitiveness",
|
| 46 |
+
"flexibility": "Flexibility",
|
| 47 |
+
"clarity": "Clarity",
|
| 48 |
+
"perceptiveness": "Perceptiveness",
|
| 49 |
+
"consistency": "Consistency",
|
| 50 |
+
"confidence": "Confidence",
|
| 51 |
+
"transparency": "Transparency",
|
| 52 |
+
"personality-consistency": "Personality Consistency",
|
| 53 |
+
"personality-definition": "Personality Definition",
|
| 54 |
+
"honesty-empathy-fairness": "Honesty & Empathy",
|
| 55 |
+
"alignment": "Alignment",
|
| 56 |
+
"cultural_relevance": "Cultural Relevance",
|
| 57 |
+
"bias_freedom": "Freedom from Bias",
|
| 58 |
+
"background_and_culture": "Cultural Background"
|
| 59 |
+
};
|
| 60 |
+
|
| 61 |
+
return aspectMap[aspect] || (aspect ? aspect.replace(/_/g, ' ').replace(/-/g, ' ').replace(/\b\w/g, l => l.toUpperCase()) : aspect);
|
| 62 |
+
};
|
| 63 |
+
|
| 64 |
+
// Format categories for the radar chart
|
| 65 |
+
const formatCategoryName = (category) => {
|
| 66 |
+
if (category.includes('_') || category === "Insightful") {
|
| 67 |
+
return formatFacetName(category.toLowerCase());
|
| 68 |
+
}
|
| 69 |
+
return category;
|
| 70 |
+
};
|
| 71 |
+
|
| 72 |
+
// Get color based on score value
|
| 73 |
+
const getScoreColor = (score) => {
|
| 74 |
+
if (score >= 90) return "text-green-600 font-semibold";
|
| 75 |
+
if (score >= 80) return "text-green-500";
|
| 76 |
+
if (score >= 70) return "text-green-400";
|
| 77 |
+
if (score >= 60) return "text-sky-500";
|
| 78 |
+
if (score >= 50) return "text-sky-400";
|
| 79 |
+
if (score >= 40) return "text-yellow-500";
|
| 80 |
+
if (score >= 30) return "text-yellow-400";
|
| 81 |
+
return "text-red-500";
|
| 82 |
+
};
|
| 83 |
+
|
| 84 |
+
// Get background color based on score (for badges)
|
| 85 |
+
const getScoreBgColor = (score) => {
|
| 86 |
+
if (score >= 90) return "bg-green-100 text-green-800";
|
| 87 |
+
if (score >= 80) return "bg-green-50 text-green-700";
|
| 88 |
+
if (score >= 70) return "bg-sky-100 text-sky-800";
|
| 89 |
+
if (score >= 60) return "bg-sky-50 text-sky-700";
|
| 90 |
+
if (score >= 50) return "bg-yellow-100 text-yellow-800";
|
| 91 |
+
if (score < 50) return "bg-red-100 text-red-800";
|
| 92 |
+
return "bg-gray-100 text-gray-800";
|
| 93 |
+
};
|
| 94 |
+
|
| 95 |
+
// Custom tooltip with proper formatting
|
| 96 |
+
const CustomTooltip = ({ active, payload, label }) => {
|
| 97 |
+
if (active && payload && payload.length) {
|
| 98 |
+
// Format the label based on whether it's a facet or aspect
|
| 99 |
+
const formattedLabel = formatCategoryName(label);
|
| 100 |
+
|
| 101 |
+
return (
|
| 102 |
+
<div className="bg-white p-3 border rounded shadow-sm">
|
| 103 |
+
<p className="font-medium">{formattedLabel}</p>
|
| 104 |
+
<div className="mt-2">
|
| 105 |
+
{payload
|
| 106 |
+
.filter(entry => !entry.dataKey.includes('_std'))
|
| 107 |
+
.map((entry, index) => {
|
| 108 |
+
const stdEntry = payload.find(p => p.dataKey === `${entry.dataKey}_std`);
|
| 109 |
+
const stdValue = stdEntry ? stdEntry.value : 0;
|
| 110 |
+
|
| 111 |
+
return (
|
| 112 |
+
<div key={index} className="flex items-center text-sm mb-1">
|
| 113 |
+
<div
|
| 114 |
+
className="w-3 h-3 rounded-full mr-1"
|
| 115 |
+
style={{ backgroundColor: entry.color }}
|
| 116 |
+
></div>
|
| 117 |
+
<span className="mr-2">{entry.name}:</span>
|
| 118 |
+
<span className="font-medium">{entry.value.toFixed(1)} ± {stdValue.toFixed(1)}</span>
|
| 119 |
+
</div>
|
| 120 |
+
);
|
| 121 |
+
})}
|
| 122 |
+
</div>
|
| 123 |
+
</div>
|
| 124 |
+
);
|
| 125 |
+
}
|
| 126 |
+
return null;
|
| 127 |
+
};
|
| 128 |
+
|
| 129 |
+
const MetricsBreakdown = ({ data }) => {
|
| 130 |
+
const [viewMode, setViewMode] = useState("facets"); // "facets" or "aspects"
|
| 131 |
+
const [selectedModels, setSelectedModels] = useState([]);
|
| 132 |
+
const [selectedFacet, setSelectedFacet] = useState(null);
|
| 133 |
+
|
| 134 |
+
const {
|
| 135 |
+
models,
|
| 136 |
+
facets,
|
| 137 |
+
radarData,
|
| 138 |
+
bestModelPerFacet
|
| 139 |
+
} = data;
|
| 140 |
+
|
| 141 |
+
// Initialize selected facet and models
|
| 142 |
+
useEffect(() => {
|
| 143 |
+
if (!selectedFacet && facets && Object.keys(facets).length > 0) {
|
| 144 |
+
// Skip repeat_usage and select the first actual facet
|
| 145 |
+
const availableFacets = Object.keys(facets).filter(f => f !== "repeat_usage");
|
| 146 |
+
if (availableFacets.length > 0) {
|
| 147 |
+
setSelectedFacet(availableFacets[0]);
|
| 148 |
+
}
|
| 149 |
+
}
|
| 150 |
+
|
| 151 |
+
if (selectedModels.length === 0 && models?.length > 0) {
|
| 152 |
+
// Select all models by default (up to 6 models)
|
| 153 |
+
setSelectedModels(models.map(m => m.model));
|
| 154 |
+
}
|
| 155 |
+
}, [facets, selectedFacet, models, selectedModels]);
|
| 156 |
+
|
| 157 |
+
// Get model by name
|
| 158 |
+
const getModelByName = (name) => {
|
| 159 |
+
return models.find(m => m.model === name);
|
| 160 |
+
};
|
| 161 |
+
|
| 162 |
+
// Generate aspect radar data for selected facet
|
| 163 |
+
const getAspectRadarData = () => {
|
| 164 |
+
if (!selectedFacet || !facets) return [];
|
| 165 |
+
|
| 166 |
+
const selectedAspects = facets[selectedFacet] || [];
|
| 167 |
+
if (selectedAspects.length === 0) return [];
|
| 168 |
+
|
| 169 |
+
// Create radar data format with aspect as categories
|
| 170 |
+
return selectedAspects.map(aspect => {
|
| 171 |
+
const entry = {
|
| 172 |
+
category: formatAspectName(aspect),
|
| 173 |
+
aspect
|
| 174 |
+
};
|
| 175 |
+
|
| 176 |
+
// Add data for selected models
|
| 177 |
+
models
|
| 178 |
+
.filter(m => selectedModels.includes(m.model))
|
| 179 |
+
.forEach(model => {
|
| 180 |
+
if (model.breakdown_scores && model.breakdown_scores[aspect] !== undefined) {
|
| 181 |
+
entry[model.model] = model.breakdown_scores[aspect];
|
| 182 |
+
}
|
| 183 |
+
});
|
| 184 |
+
|
| 185 |
+
return entry;
|
| 186 |
+
});
|
| 187 |
+
};
|
| 188 |
+
|
| 189 |
+
// Get selected facet aspects
|
| 190 |
+
const getSelectedFacetAspects = () => {
|
| 191 |
+
if (!selectedFacet || !facets) return [];
|
| 192 |
+
return facets[selectedFacet] || [];
|
| 193 |
+
};
|
| 194 |
+
|
| 195 |
+
// Get facet data for the radar chart
|
| 196 |
+
const getFacetRadarData = () => {
|
| 197 |
+
if (!radarData) return [];
|
| 198 |
+
|
| 199 |
+
// This ensures the data contains only the selected models
|
| 200 |
+
return radarData.map(item => {
|
| 201 |
+
// Create a new object with only the properties we want
|
| 202 |
+
const newItem = { category: item.category };
|
| 203 |
+
|
| 204 |
+
// Copy only the selected models' data
|
| 205 |
+
models
|
| 206 |
+
.filter(m => selectedModels.includes(m.model))
|
| 207 |
+
.forEach(model => {
|
| 208 |
+
newItem[model.model] = item[model.model];
|
| 209 |
+
});
|
| 210 |
+
|
| 211 |
+
return newItem;
|
| 212 |
+
});
|
| 213 |
+
};
|
| 214 |
+
|
| 215 |
+
// Calculate top performers based on selected models only
|
| 216 |
+
const getTopPerformersByFacet = () => {
|
| 217 |
+
if (!facets || !models) return {};
|
| 218 |
+
|
| 219 |
+
const topPerformers = {};
|
| 220 |
+
|
| 221 |
+
// For each facet, find the best model among selected models
|
| 222 |
+
Object.keys(facets)
|
| 223 |
+
.filter(facet => facet !== "repeat_usage")
|
| 224 |
+
.forEach(facet => {
|
| 225 |
+
let bestModel = null;
|
| 226 |
+
let bestScore = -Infinity;
|
| 227 |
+
|
| 228 |
+
// Check each selected model
|
| 229 |
+
models
|
| 230 |
+
.filter(m => selectedModels.includes(m.model))
|
| 231 |
+
.forEach(model => {
|
| 232 |
+
const score = model.facet_scores?.[facet];
|
| 233 |
+
if (score !== undefined && score > bestScore) {
|
| 234 |
+
bestScore = score;
|
| 235 |
+
bestModel = {
|
| 236 |
+
model: model.model,
|
| 237 |
+
score: score,
|
| 238 |
+
modelObj: model
|
| 239 |
+
};
|
| 240 |
+
}
|
| 241 |
+
});
|
| 242 |
+
|
| 243 |
+
if (bestModel) {
|
| 244 |
+
topPerformers[facet] = bestModel;
|
| 245 |
+
}
|
| 246 |
+
});
|
| 247 |
+
|
| 248 |
+
return topPerformers;
|
| 249 |
+
};
|
| 250 |
+
|
| 251 |
+
// Calculate top performers for each aspect of the selected facet
|
| 252 |
+
const getTopPerformersByAspect = () => {
|
| 253 |
+
if (!selectedFacet || !facets || !models) return [];
|
| 254 |
+
|
| 255 |
+
const selectedAspects = facets[selectedFacet] || [];
|
| 256 |
+
const topPerformers = [];
|
| 257 |
+
|
| 258 |
+
// For each aspect, find the best model among selected models
|
| 259 |
+
selectedAspects.forEach(aspect => {
|
| 260 |
+
let bestModel = null;
|
| 261 |
+
let bestScore = -Infinity;
|
| 262 |
+
|
| 263 |
+
// Check each selected model
|
| 264 |
+
models
|
| 265 |
+
.filter(m => selectedModels.includes(m.model))
|
| 266 |
+
.forEach(model => {
|
| 267 |
+
const score = model.breakdown_scores?.[aspect];
|
| 268 |
+
if (score !== undefined && score > bestScore) {
|
| 269 |
+
bestScore = score;
|
| 270 |
+
bestModel = {
|
| 271 |
+
model: model.model,
|
| 272 |
+
score: score,
|
| 273 |
+
modelObj: model
|
| 274 |
+
};
|
| 275 |
+
}
|
| 276 |
+
});
|
| 277 |
+
|
| 278 |
+
if (bestModel) {
|
| 279 |
+
topPerformers.push({
|
| 280 |
+
aspect,
|
| 281 |
+
aspectName: formatAspectName(aspect),
|
| 282 |
+
...bestModel
|
| 283 |
+
});
|
| 284 |
+
}
|
| 285 |
+
});
|
| 286 |
+
|
| 287 |
+
return topPerformers;
|
| 288 |
+
};
|
| 289 |
+
|
| 290 |
+
// Prepare data
|
| 291 |
+
const selectedAspects = getSelectedFacetAspects();
|
| 292 |
+
const facetRadarData = getFacetRadarData();
|
| 293 |
+
const aspectRadarData = getAspectRadarData();
|
| 294 |
+
const topPerformers = getTopPerformersByFacet();
|
| 295 |
+
const topAspectPerformers = getTopPerformersByAspect();
|
| 296 |
+
|
| 297 |
+
return (
|
| 298 |
+
<>
|
| 299 |
+
{/* Top-level controls */}
|
| 300 |
+
<div className="mb-4 flex justify-between items-center flex-wrap">
|
| 301 |
+
<div className="flex items-center space-x-4">
|
| 302 |
+
{/* View toggle */}
|
| 303 |
+
<div className="flex space-x-1 p-1 bg-gray-100 rounded-lg">
|
| 304 |
+
<button
|
| 305 |
+
className={`px-4 py-1.5 text-sm font-medium rounded-md ${
|
| 306 |
+
viewMode === "facets" ? "bg-white shadow text-sky-700" : "text-gray-700"
|
| 307 |
+
}`}
|
| 308 |
+
onClick={() => setViewMode("facets")}
|
| 309 |
+
>
|
| 310 |
+
Facets
|
| 311 |
+
</button>
|
| 312 |
+
<button
|
| 313 |
+
className={`px-2 py-1.5 text-sm font-medium rounded-md ${
|
| 314 |
+
viewMode === "aspects" ? "bg-white shadow text-sky-700" : "text-gray-700"
|
| 315 |
+
}`}
|
| 316 |
+
onClick={() => setViewMode("aspects")}
|
| 317 |
+
>
|
| 318 |
+
Aspects
|
| 319 |
+
</button>
|
| 320 |
+
</div>
|
| 321 |
+
|
| 322 |
+
{/* Facet selector (shown when in aspects view) */}
|
| 323 |
+
{viewMode === "aspects" && (
|
| 324 |
+
<div className="flex items-center">
|
| 325 |
+
<span className="text-sm font-medium mr-1">Select Facet:</span>
|
| 326 |
+
<select
|
| 327 |
+
className="text-sm border rounded px-2 py-1.5 bg-white"
|
| 328 |
+
value={selectedFacet || ''}
|
| 329 |
+
onChange={(e) => setSelectedFacet(e.target.value)}
|
| 330 |
+
>
|
| 331 |
+
{Object.keys(facets || {})
|
| 332 |
+
.filter(f => f !== "repeat_usage")
|
| 333 |
+
.map(facet => (
|
| 334 |
+
<option key={facet} value={facet}>
|
| 335 |
+
{formatFacetName(facet)}
|
| 336 |
+
</option>
|
| 337 |
+
))}
|
| 338 |
+
</select>
|
| 339 |
+
</div>
|
| 340 |
+
)}
|
| 341 |
+
</div>
|
| 342 |
+
|
| 343 |
+
{/* Model selector */}
|
| 344 |
+
<div className="mt-2 sm:mt-0">
|
| 345 |
+
<span className="text-sm text-gray-500 mr-2">Select Models:</span>
|
| 346 |
+
<div className="inline-flex flex-wrap gap-1">
|
| 347 |
+
{models?.map(model => (
|
| 348 |
+
<button
|
| 349 |
+
key={model.model}
|
| 350 |
+
className={`px-2 py-0.5 text-sm rounded ${
|
| 351 |
+
selectedModels.includes(model.model)
|
| 352 |
+
? "bg-sky-100 border text-sky-800 border-sky-300"
|
| 353 |
+
: "bg-gray-100 text-gray-600"
|
| 354 |
+
}`}
|
| 355 |
+
onClick={() => {
|
| 356 |
+
if (selectedModels.includes(model.model)) {
|
| 357 |
+
if (selectedModels.length > 1) {
|
| 358 |
+
setSelectedModels(selectedModels.filter(m => m !== model.model));
|
| 359 |
+
}
|
| 360 |
+
} else {
|
| 361 |
+
setSelectedModels([...selectedModels, model.model]);
|
| 362 |
+
}
|
| 363 |
+
}}
|
| 364 |
+
>
|
| 365 |
+
{model.model}
|
| 366 |
+
</button>
|
| 367 |
+
))}
|
| 368 |
+
</div>
|
| 369 |
+
</div>
|
| 370 |
+
</div>
|
| 371 |
+
|
| 372 |
+
{/* Performance Summary Table */}
|
| 373 |
+
<div className="border rounded-lg overflow-hidden mb-4">
|
| 374 |
+
<div className="px-4 py-2 bg-gray-50 border-b">
|
| 375 |
+
<h3 className="font-semibold">Performance Summary</h3>
|
| 376 |
+
</div>
|
| 377 |
+
<div className="p-4 overflow-x-auto">
|
| 378 |
+
<table className="min-w-full divide-y divide-gray-200">
|
| 379 |
+
<thead>
|
| 380 |
+
<tr>
|
| 381 |
+
<th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">Model</th>
|
| 382 |
+
{viewMode === "facets" ? (
|
| 383 |
+
// Show facets in facet view
|
| 384 |
+
Object.keys(facets || {})
|
| 385 |
+
.filter(f => f !== "repeat_usage")
|
| 386 |
+
.map(facet => (
|
| 387 |
+
<th key={facet} className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">
|
| 388 |
+
{formatFacetName(facet)}
|
| 389 |
+
</th>
|
| 390 |
+
))
|
| 391 |
+
) : (
|
| 392 |
+
// Show aspects in aspect view
|
| 393 |
+
selectedAspects.map(aspect => (
|
| 394 |
+
<th key={aspect} className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">
|
| 395 |
+
{formatAspectName(aspect)}
|
| 396 |
+
</th>
|
| 397 |
+
))
|
| 398 |
+
)}
|
| 399 |
+
</tr>
|
| 400 |
+
</thead>
|
| 401 |
+
<tbody className="bg-white divide-y divide-gray-200">
|
| 402 |
+
{models
|
| 403 |
+
?.filter(m => selectedModels.includes(m.model))
|
| 404 |
+
.map((model, idx) => (
|
| 405 |
+
<tr key={model.model} className={idx % 2 === 0 ? "bg-white" : "bg-gray-50"}>
|
| 406 |
+
<td className="px-3 py-2">
|
| 407 |
+
<div className="flex items-center">
|
| 408 |
+
<div
|
| 409 |
+
className="w-3 h-3 rounded-full mr-2"
|
| 410 |
+
style={{ backgroundColor: model.color }}
|
| 411 |
+
></div>
|
| 412 |
+
<span className="text-sm font-medium">{model.model}</span>
|
| 413 |
+
</div>
|
| 414 |
+
</td>
|
| 415 |
+
{viewMode === "facets" ? (
|
| 416 |
+
// Show facet scores in facet view
|
| 417 |
+
Object.keys(facets || {})
|
| 418 |
+
.filter(f => f !== "repeat_usage")
|
| 419 |
+
.map(facet => {
|
| 420 |
+
const score = model.facet_scores?.[facet] || 0;
|
| 421 |
+
return (
|
| 422 |
+
<td key={facet} className="px-3 py-2">
|
| 423 |
+
<div className={`text-sm ${getScoreColor(score)}`}>
|
| 424 |
+
{score.toFixed(1)}
|
| 425 |
+
</div>
|
| 426 |
+
</td>
|
| 427 |
+
);
|
| 428 |
+
})
|
| 429 |
+
) : (
|
| 430 |
+
// Show aspect scores in aspect view
|
| 431 |
+
selectedAspects.map(aspect => {
|
| 432 |
+
const score = model.breakdown_scores?.[aspect] || 0;
|
| 433 |
+
return (
|
| 434 |
+
<td key={aspect} className="px-3 py-2">
|
| 435 |
+
<div className={`text-sm ${getScoreColor(score)}`}>
|
| 436 |
+
{score.toFixed(1)}
|
| 437 |
+
</div>
|
| 438 |
+
</td>
|
| 439 |
+
);
|
| 440 |
+
})
|
| 441 |
+
)}
|
| 442 |
+
</tr>
|
| 443 |
+
))}
|
| 444 |
+
</tbody>
|
| 445 |
+
</table>
|
| 446 |
+
</div>
|
| 447 |
+
</div>
|
| 448 |
+
|
| 449 |
+
{/* Conditional content based on view mode */}
|
| 450 |
+
{viewMode === "facets" ? (
|
| 451 |
+
// FACETS VIEW
|
| 452 |
+
<>
|
| 453 |
+
{/* Radar Chart */}
|
| 454 |
+
<div className="border rounded-lg overflow-hidden mb-4">
|
| 455 |
+
<div className="px-4 py-2 bg-gray-50 border-b flex justify-between items-center">
|
| 456 |
+
<h3 className="font-semibold">Model Performance Across Facets</h3>
|
| 457 |
+
<div className="text-xs text-gray-500">Radar chart showing model strengths</div>
|
| 458 |
+
</div>
|
| 459 |
+
<div className="p-4">
|
| 460 |
+
<div className="h-96">
|
| 461 |
+
<ResponsiveContainer width="100%" height="100%">
|
| 462 |
+
<RadarChart
|
| 463 |
+
outerRadius={160}
|
| 464 |
+
data={facetRadarData}
|
| 465 |
+
>
|
| 466 |
+
<PolarGrid gridType="polygon" />
|
| 467 |
+
<PolarAngleAxis
|
| 468 |
+
dataKey="category"
|
| 469 |
+
tick={{ fill: "#4b5563", fontSize: 14 }}
|
| 470 |
+
tickLine={false}
|
| 471 |
+
tickFormatter={formatCategoryName}
|
| 472 |
+
/>
|
| 473 |
+
<PolarRadiusAxis
|
| 474 |
+
angle={90}
|
| 475 |
+
domain={[-100, 100]}
|
| 476 |
+
axisLine={false}
|
| 477 |
+
tick={{ fontSize: 12 }}
|
| 478 |
+
tickCount={5}
|
| 479 |
+
/>
|
| 480 |
+
{models
|
| 481 |
+
?.filter(m => selectedModels.includes(m.model))
|
| 482 |
+
.map((model) => (
|
| 483 |
+
<Radar
|
| 484 |
+
key={model.model}
|
| 485 |
+
name={model.model}
|
| 486 |
+
dataKey={model.model}
|
| 487 |
+
stroke={model.color}
|
| 488 |
+
fill={model.color}
|
| 489 |
+
fillOpacity={0.2}
|
| 490 |
+
strokeWidth={2}
|
| 491 |
+
/>
|
| 492 |
+
))}
|
| 493 |
+
<Tooltip content={<CustomTooltip />} />
|
| 494 |
+
<Legend />
|
| 495 |
+
</RadarChart>
|
| 496 |
+
</ResponsiveContainer>
|
| 497 |
+
</div>
|
| 498 |
+
</div>
|
| 499 |
+
</div>
|
| 500 |
+
|
| 501 |
+
{/* Top Performers Table */}
|
| 502 |
+
<div className="border rounded-lg overflow-hidden">
|
| 503 |
+
<div className="px-4 py-2 bg-gray-50 border-b">
|
| 504 |
+
<h3 className="font-semibold">Top Performers by Facet</h3>
|
| 505 |
+
</div>
|
| 506 |
+
<div className="p-4">
|
| 507 |
+
<table className="min-w-full divide-y divide-gray-200">
|
| 508 |
+
<thead>
|
| 509 |
+
<tr>
|
| 510 |
+
<th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">Facet</th>
|
| 511 |
+
<th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">Best Model</th>
|
| 512 |
+
<th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">Score</th>
|
| 513 |
+
</tr>
|
| 514 |
+
</thead>
|
| 515 |
+
<tbody className="bg-white divide-y divide-gray-200">
|
| 516 |
+
{Object.entries(topPerformers)
|
| 517 |
+
.map(([facet, bestModel], idx) => (
|
| 518 |
+
<tr key={facet} className={idx % 2 === 0 ? "bg-white" : "bg-gray-50"}>
|
| 519 |
+
<td className="px-3 py-2 font-medium">{formatFacetName(facet)}</td>
|
| 520 |
+
<td className="px-3 py-2">
|
| 521 |
+
<div className="flex items-center">
|
| 522 |
+
<div
|
| 523 |
+
className="w-3 h-3 rounded-full mr-2"
|
| 524 |
+
style={{ backgroundColor: bestModel.modelObj?.color }}
|
| 525 |
+
></div>
|
| 526 |
+
<span>{bestModel.model}</span>
|
| 527 |
+
</div>
|
| 528 |
+
</td>
|
| 529 |
+
<td className="px-3 py-2">
|
| 530 |
+
<span className={`px-2 py-0.5 rounded-full text-sm font-medium ${getScoreBgColor(bestModel.score)}`}>
|
| 531 |
+
{bestModel.score.toFixed(1)}
|
| 532 |
+
</span>
|
| 533 |
+
</td>
|
| 534 |
+
</tr>
|
| 535 |
+
))}
|
| 536 |
+
</tbody>
|
| 537 |
+
</table>
|
| 538 |
+
</div>
|
| 539 |
+
</div>
|
| 540 |
+
</>
|
| 541 |
+
) : (
|
| 542 |
+
// ASPECTS VIEW
|
| 543 |
+
<>
|
| 544 |
+
{/* Aspect Radar Chart */}
|
| 545 |
+
<div className="border rounded-lg overflow-hidden mb-4">
|
| 546 |
+
<div className="px-4 py-2 bg-gray-50 border-b">
|
| 547 |
+
<h3 className="font-semibold">Aspect Breakdown for {formatFacetName(selectedFacet || '')}</h3>
|
| 548 |
+
</div>
|
| 549 |
+
<div className="p-4">
|
| 550 |
+
<div className="h-96">
|
| 551 |
+
<ResponsiveContainer width="100%" height="100%">
|
| 552 |
+
<RadarChart
|
| 553 |
+
outerRadius={160}
|
| 554 |
+
data={aspectRadarData}
|
| 555 |
+
>
|
| 556 |
+
<PolarGrid gridType="polygon" />
|
| 557 |
+
<PolarAngleAxis
|
| 558 |
+
dataKey="category"
|
| 559 |
+
tick={{ fill: "#4b5563", fontSize: 12 }}
|
| 560 |
+
tickLine={false}
|
| 561 |
+
/>
|
| 562 |
+
<PolarRadiusAxis
|
| 563 |
+
angle={90}
|
| 564 |
+
domain={[0, 100]}
|
| 565 |
+
axisLine={false}
|
| 566 |
+
tick={{ fontSize: 12 }}
|
| 567 |
+
tickCount={5}
|
| 568 |
+
/>
|
| 569 |
+
{models
|
| 570 |
+
?.filter(m => selectedModels.includes(m.model))
|
| 571 |
+
.map((model) => (
|
| 572 |
+
<Radar
|
| 573 |
+
key={model.model}
|
| 574 |
+
name={model.model}
|
| 575 |
+
dataKey={model.model}
|
| 576 |
+
stroke={model.color}
|
| 577 |
+
fill={model.color}
|
| 578 |
+
fillOpacity={0.2}
|
| 579 |
+
strokeWidth={2}
|
| 580 |
+
/>
|
| 581 |
+
))}
|
| 582 |
+
<Tooltip content={<CustomTooltip />} />
|
| 583 |
+
<Legend />
|
| 584 |
+
</RadarChart>
|
| 585 |
+
</ResponsiveContainer>
|
| 586 |
+
</div>
|
| 587 |
+
|
| 588 |
+
<div className="mt-2 text-xs text-gray-500 text-center">
|
| 589 |
+
Aspect scores for {formatFacetName(selectedFacet)} (0-100 scale)
|
| 590 |
+
</div>
|
| 591 |
+
</div>
|
| 592 |
+
</div>
|
| 593 |
+
|
| 594 |
+
{/* Top Performers by Aspect Table */}
|
| 595 |
+
<div className="border rounded-lg overflow-hidden">
|
| 596 |
+
<div className="px-4 py-2 bg-gray-50 border-b">
|
| 597 |
+
<h3 className="font-semibold">Top Performers by Aspect in {formatFacetName(selectedFacet || '')}</h3>
|
| 598 |
+
</div>
|
| 599 |
+
<div className="p-4">
|
| 600 |
+
<table className="min-w-full divide-y divide-gray-200">
|
| 601 |
+
<thead>
|
| 602 |
+
<tr>
|
| 603 |
+
<th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">Aspect</th>
|
| 604 |
+
<th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">Best Model</th>
|
| 605 |
+
<th className="px-3 py-2 text-left text-xs font-medium text-gray-500 uppercase tracking-wider">Score</th>
|
| 606 |
+
</tr>
|
| 607 |
+
</thead>
|
| 608 |
+
<tbody className="bg-white divide-y divide-gray-200">
|
| 609 |
+
{topAspectPerformers.map((performer, idx) => (
|
| 610 |
+
<tr key={performer.aspect} className={idx % 2 === 0 ? "bg-white" : "bg-gray-50"}>
|
| 611 |
+
<td className="px-3 py-2 font-medium">{performer.aspectName}</td>
|
| 612 |
+
<td className="px-3 py-2">
|
| 613 |
+
<div className="flex items-center">
|
| 614 |
+
<div
|
| 615 |
+
className="w-3 h-3 rounded-full mr-2"
|
| 616 |
+
style={{ backgroundColor: performer.modelObj?.color }}
|
| 617 |
+
></div>
|
| 618 |
+
<span>{performer.model}</span>
|
| 619 |
+
</div>
|
| 620 |
+
</td>
|
| 621 |
+
<td className="px-3 py-2">
|
| 622 |
+
<span className={`px-2 py-0.5 rounded-full text-sm font-medium ${getScoreBgColor(performer.score)}`}>
|
| 623 |
+
{performer.score.toFixed(1)}
|
| 624 |
+
</span>
|
| 625 |
+
</td>
|
| 626 |
+
</tr>
|
| 627 |
+
))}
|
| 628 |
+
</tbody>
|
| 629 |
+
</table>
|
| 630 |
+
</div>
|
| 631 |
+
</div>
|
| 632 |
+
</>
|
| 633 |
+
)}
|
| 634 |
+
</>
|
| 635 |
+
);
|
| 636 |
+
};
|
| 637 |
+
|
| 638 |
+
export default MetricsBreakdown;
|
leaderboard-app/components/TaskDemographicAnalysis.jsx
ADDED
|
@@ -0,0 +1,1416 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"use client";
|
| 2 |
+
|
| 3 |
+
import React, { useState, useEffect, useMemo } from "react";
|
| 4 |
+
import {
|
| 5 |
+
BarChart,
|
| 6 |
+
Bar,
|
| 7 |
+
XAxis,
|
| 8 |
+
YAxis,
|
| 9 |
+
CartesianGrid,
|
| 10 |
+
Tooltip,
|
| 11 |
+
Legend,
|
| 12 |
+
ResponsiveContainer,
|
| 13 |
+
ReferenceLine,
|
| 14 |
+
Cell,
|
| 15 |
+
} from "recharts";
|
| 16 |
+
import { getScoreBadgeColor } from "../lib/utils";
|
| 17 |
+
|
| 18 |
+
// Helper component for info tooltips
|
| 19 |
+
const InfoTooltip = ({ text }) => {
|
| 20 |
+
const [isVisible, setIsVisible] = useState(false);
|
| 21 |
+
|
| 22 |
+
return (
|
| 23 |
+
<div className="relative inline-block ml-1">
|
| 24 |
+
<button
|
| 25 |
+
className="text-gray-400 hover:text-gray-600 focus:outline-none"
|
| 26 |
+
onMouseEnter={() => setIsVisible(true)}
|
| 27 |
+
onMouseLeave={() => setIsVisible(false)}
|
| 28 |
+
onClick={() => setIsVisible(!isVisible)}
|
| 29 |
+
>
|
| 30 |
+
<svg
|
| 31 |
+
xmlns="http://www.w3.org/2000/svg"
|
| 32 |
+
className="h-4 w-4"
|
| 33 |
+
viewBox="0 0 20 20"
|
| 34 |
+
fill="currentColor"
|
| 35 |
+
>
|
| 36 |
+
<path
|
| 37 |
+
fillRule="evenodd"
|
| 38 |
+
d="M18 10a8 8 0 11-16 0 8 8 0 0116 0zm-7-4a1 1 0 11-2 0 1 1 0 012 0zM9 9a1 1 0 000 2v3a1 1 0 001 1h1a1 1 0 100-2v-3a1 1 0 00-1-1H9z"
|
| 39 |
+
clipRule="evenodd"
|
| 40 |
+
/>
|
| 41 |
+
</svg>
|
| 42 |
+
</button>
|
| 43 |
+
{isVisible && (
|
| 44 |
+
<div className="absolute z-10 w-64 p-2 bg-white border rounded shadow-lg text-xs text-gray-700 -translate-x-1/2 left-1/2 mt-1">
|
| 45 |
+
{text}
|
| 46 |
+
</div>
|
| 47 |
+
)}
|
| 48 |
+
</div>
|
| 49 |
+
);
|
| 50 |
+
};
|
| 51 |
+
|
| 52 |
+
// Format facet names for display
|
| 53 |
+
const formatFacetName = (facet) => {
|
| 54 |
+
const facetMap = {
|
| 55 |
+
helpfulness: "Helpfulness",
|
| 56 |
+
communication: "Communication",
|
| 57 |
+
insightful: "Insightfulness",
|
| 58 |
+
adaptiveness: "Adaptiveness",
|
| 59 |
+
trustworthiness: "Trustworthiness",
|
| 60 |
+
personality: "Personality",
|
| 61 |
+
background_and_culture: "Cultural Awareness",
|
| 62 |
+
};
|
| 63 |
+
|
| 64 |
+
return (
|
| 65 |
+
facetMap[facet] ||
|
| 66 |
+
(facet
|
| 67 |
+
? facet.replace(/_/g, " ").replace(/\b\w/g, (l) => l.toUpperCase())
|
| 68 |
+
: facet)
|
| 69 |
+
);
|
| 70 |
+
};
|
| 71 |
+
|
| 72 |
+
// Filter tag component for displaying active filters
|
| 73 |
+
const FilterTag = ({ label, onRemove }) => (
|
| 74 |
+
<div className="inline-flex items-center px-2 py-1 mr-2 mb-2 text-xs font-medium rounded-full bg-blue-100 text-blue-800">
|
| 75 |
+
{label}
|
| 76 |
+
{onRemove && (
|
| 77 |
+
<button
|
| 78 |
+
onClick={onRemove}
|
| 79 |
+
className="ml-1 text-blue-600 hover:text-blue-800 focus:outline-none"
|
| 80 |
+
>
|
| 81 |
+
<svg
|
| 82 |
+
xmlns="http://www.w3.org/2000/svg"
|
| 83 |
+
className="h-3 w-3"
|
| 84 |
+
viewBox="0 0 20 20"
|
| 85 |
+
fill="currentColor"
|
| 86 |
+
>
|
| 87 |
+
<path
|
| 88 |
+
fillRule="evenodd"
|
| 89 |
+
d="M10 18a8 8 0 100-16 8 8 0 000 16zM8.707 7.293a1 1 0 00-1.414 1.414L8.586 10l-1.293 1.293a1 1 0 101.414 1.414L10 11.414l1.293 1.293a1 1 0 001.414-1.414L11.414 10l1.293-1.293a1 1 0 00-1.414-1.414L10 8.586 8.707 7.293z"
|
| 90 |
+
clipRule="evenodd"
|
| 91 |
+
/>
|
| 92 |
+
</svg>
|
| 93 |
+
</button>
|
| 94 |
+
)}
|
| 95 |
+
</div>
|
| 96 |
+
);
|
| 97 |
+
|
| 98 |
+
/* Clean, minimal insight component inspired by the equity ranking design */
|
| 99 |
+
const CleanInsightItem = ({ insight, index, models }) => {
|
| 100 |
+
// Extract model names and metrics from the insight text
|
| 101 |
+
const enhanceText = (text) => {
|
| 102 |
+
// First, find and highlight any numeric values with bold
|
| 103 |
+
const numericPattern = /(\d+\.?\d*)/g;
|
| 104 |
+
let enhancedText = text.replace(numericPattern, "<strong>$1</strong>");
|
| 105 |
+
|
| 106 |
+
// Then highlight model names
|
| 107 |
+
models.forEach((model) => {
|
| 108 |
+
const modelName = model.model;
|
| 109 |
+
if (text.includes(modelName)) {
|
| 110 |
+
enhancedText = enhancedText.replace(
|
| 111 |
+
new RegExp(modelName, "g"),
|
| 112 |
+
`<span class="font-medium" style="color: ${model.color}">${modelName}</span>`
|
| 113 |
+
);
|
| 114 |
+
}
|
| 115 |
+
});
|
| 116 |
+
|
| 117 |
+
return enhancedText;
|
| 118 |
+
};
|
| 119 |
+
|
| 120 |
+
// Determine the type of insight for styling
|
| 121 |
+
const getInsightType = (text) => {
|
| 122 |
+
if (
|
| 123 |
+
text.includes("performs best") ||
|
| 124 |
+
text.includes("excellent equity") ||
|
| 125 |
+
text.includes("achieves the highest")
|
| 126 |
+
) {
|
| 127 |
+
return "positive";
|
| 128 |
+
} else if (
|
| 129 |
+
text.includes("potential equity concerns") ||
|
| 130 |
+
text.includes("worst") ||
|
| 131 |
+
text.includes("gap between")
|
| 132 |
+
) {
|
| 133 |
+
return "negative";
|
| 134 |
+
} else if (text.includes("point gap")) {
|
| 135 |
+
return "comparison";
|
| 136 |
+
} else {
|
| 137 |
+
return "info";
|
| 138 |
+
}
|
| 139 |
+
};
|
| 140 |
+
|
| 141 |
+
// Get color based on insight type
|
| 142 |
+
const getTypeColor = (type) => {
|
| 143 |
+
switch (type) {
|
| 144 |
+
case "positive":
|
| 145 |
+
return "text-green-700 bg-green-50";
|
| 146 |
+
case "negative":
|
| 147 |
+
return "text-red-700 bg-red-50";
|
| 148 |
+
case "comparison":
|
| 149 |
+
return "text-blue-700 bg-blue-50";
|
| 150 |
+
default:
|
| 151 |
+
return "text-gray-700 bg-gray-50";
|
| 152 |
+
}
|
| 153 |
+
};
|
| 154 |
+
|
| 155 |
+
const insightType = getInsightType(insight);
|
| 156 |
+
const typeColor = getTypeColor(insightType);
|
| 157 |
+
|
| 158 |
+
return (
|
| 159 |
+
<div className="flex items-start py-3 px-4 border-b last:border-b-0">
|
| 160 |
+
<div className="flex-shrink-0 mr-3">
|
| 161 |
+
<div
|
| 162 |
+
className={`w-7 h-7 rounded-full flex items-center justify-center ${typeColor}`}
|
| 163 |
+
>
|
| 164 |
+
<span className="text-xs font-semibold">{index + 1}</span>
|
| 165 |
+
</div>
|
| 166 |
+
</div>
|
| 167 |
+
<div className="flex-grow">
|
| 168 |
+
<p
|
| 169 |
+
className="text-sm text-gray-800"
|
| 170 |
+
dangerouslySetInnerHTML={{ __html: enhanceText(insight) }}
|
| 171 |
+
/>
|
| 172 |
+
</div>
|
| 173 |
+
</div>
|
| 174 |
+
);
|
| 175 |
+
};
|
| 176 |
+
|
| 177 |
+
const TaskDemographicAnalysis = ({ data }) => {
|
| 178 |
+
// Analysis controls state
|
| 179 |
+
const [selectedTask, setSelectedTask] = useState("all");
|
| 180 |
+
const [selectedDemographic, setSelectedDemographic] = useState("all");
|
| 181 |
+
const [selectedModel, setSelectedModel] = useState(null);
|
| 182 |
+
const [selectedMetric, setSelectedMetric] = useState("overall_score");
|
| 183 |
+
const [viewMode, setViewMode] = useState("absolute"); // 'absolute' or 'relative'
|
| 184 |
+
const [showAllModels, setShowAllModels] = useState(true);
|
| 185 |
+
const [groupBy, setGroupBy] = useState("task"); // 'task', 'demographic', or 'combined'
|
| 186 |
+
const [keyInsightsVisible, setKeyInsightsVisible] = useState(true);
|
| 187 |
+
|
| 188 |
+
// Extracting data
|
| 189 |
+
const {
|
| 190 |
+
models,
|
| 191 |
+
taskData,
|
| 192 |
+
taskCategories,
|
| 193 |
+
demographicSummary,
|
| 194 |
+
demographicOptions,
|
| 195 |
+
fairnessMetrics,
|
| 196 |
+
facets,
|
| 197 |
+
} = data;
|
| 198 |
+
|
| 199 |
+
// Initialize selectedModel if not set
|
| 200 |
+
useEffect(() => {
|
| 201 |
+
if (!selectedModel && models.length > 0) {
|
| 202 |
+
setSelectedModel(models[0].model);
|
| 203 |
+
}
|
| 204 |
+
}, [models, selectedModel]);
|
| 205 |
+
|
| 206 |
+
// Handle group by changes - reset and disable other filters as needed
|
| 207 |
+
useEffect(() => {
|
| 208 |
+
if (groupBy === "task" && selectedDemographic !== "all") {
|
| 209 |
+
// When grouping by task, reset demographic to 'all'
|
| 210 |
+
setSelectedDemographic("all");
|
| 211 |
+
} else if (groupBy === "demographic" && selectedTask !== "all") {
|
| 212 |
+
// When grouping by demographic, reset task to 'all'
|
| 213 |
+
setSelectedTask("all");
|
| 214 |
+
}
|
| 215 |
+
}, [groupBy, selectedDemographic, selectedTask]);
|
| 216 |
+
|
| 217 |
+
// Function to get all tasks (flat list)
|
| 218 |
+
const getAllTasks = () => {
|
| 219 |
+
const allTasks = [];
|
| 220 |
+
if (taskData) {
|
| 221 |
+
taskData.forEach((task) => {
|
| 222 |
+
if (!allTasks.includes(task.task)) {
|
| 223 |
+
allTasks.push(task.task);
|
| 224 |
+
}
|
| 225 |
+
});
|
| 226 |
+
}
|
| 227 |
+
return allTasks.sort();
|
| 228 |
+
};
|
| 229 |
+
|
| 230 |
+
// Get task options including "All Tasks" and categories
|
| 231 |
+
const taskOptions = useMemo(() => {
|
| 232 |
+
// Start with "All Tasks" option
|
| 233 |
+
const allTasksOption = { value: "all", label: "All Tasks" };
|
| 234 |
+
|
| 235 |
+
// Group tasks by category
|
| 236 |
+
const categorizedTasks = {};
|
| 237 |
+
const uncategorizedTasks = [];
|
| 238 |
+
|
| 239 |
+
// Get all tasks and their categories
|
| 240 |
+
getAllTasks().forEach((task) => {
|
| 241 |
+
const taskInfo = taskData.find((t) => t.task === task);
|
| 242 |
+
if (taskInfo && taskInfo.category) {
|
| 243 |
+
if (!categorizedTasks[taskInfo.category]) {
|
| 244 |
+
categorizedTasks[taskInfo.category] = [];
|
| 245 |
+
}
|
| 246 |
+
categorizedTasks[taskInfo.category].push({
|
| 247 |
+
value: task,
|
| 248 |
+
label: task,
|
| 249 |
+
});
|
| 250 |
+
} else {
|
| 251 |
+
uncategorizedTasks.push({
|
| 252 |
+
value: task,
|
| 253 |
+
label: task,
|
| 254 |
+
});
|
| 255 |
+
}
|
| 256 |
+
});
|
| 257 |
+
|
| 258 |
+
// Format for select rendering
|
| 259 |
+
return {
|
| 260 |
+
allTasksOption,
|
| 261 |
+
categories: Object.keys(taskCategories || {}).map((category) => ({
|
| 262 |
+
label: `${category.charAt(0).toUpperCase() + category.slice(1)} Tasks`,
|
| 263 |
+
value: category,
|
| 264 |
+
isCategory: true,
|
| 265 |
+
})),
|
| 266 |
+
categorizedTasks,
|
| 267 |
+
uncategorizedTasks,
|
| 268 |
+
};
|
| 269 |
+
}, [taskData, taskCategories]);
|
| 270 |
+
|
| 271 |
+
// Helper function to get task label
|
| 272 |
+
const getTaskLabel = (taskValue) => {
|
| 273 |
+
// Check if it's "all tasks"
|
| 274 |
+
if (taskValue === "all") {
|
| 275 |
+
return "All Tasks";
|
| 276 |
+
}
|
| 277 |
+
|
| 278 |
+
// Check if it's a category
|
| 279 |
+
const category = taskOptions.categories.find((c) => c.value === taskValue);
|
| 280 |
+
if (category) {
|
| 281 |
+
return category.label;
|
| 282 |
+
}
|
| 283 |
+
|
| 284 |
+
// Look in categorized tasks
|
| 285 |
+
for (const [category, tasks] of Object.entries(
|
| 286 |
+
taskOptions.categorizedTasks
|
| 287 |
+
)) {
|
| 288 |
+
const task = tasks.find((t) => t.value === taskValue);
|
| 289 |
+
if (task) {
|
| 290 |
+
return task.label;
|
| 291 |
+
}
|
| 292 |
+
}
|
| 293 |
+
|
| 294 |
+
// Check uncategorized tasks
|
| 295 |
+
const uncategorizedTask = taskOptions.uncategorizedTasks.find(
|
| 296 |
+
(t) => t.value === taskValue
|
| 297 |
+
);
|
| 298 |
+
if (uncategorizedTask) {
|
| 299 |
+
return uncategorizedTask.label;
|
| 300 |
+
}
|
| 301 |
+
|
| 302 |
+
// Fallback to the value itself
|
| 303 |
+
return taskValue;
|
| 304 |
+
};
|
| 305 |
+
|
| 306 |
+
// Get filtered performance data based on selected filters
|
| 307 |
+
const getFilteredPerformanceData = () => {
|
| 308 |
+
if (!taskData) return [];
|
| 309 |
+
|
| 310 |
+
let filteredData = [...taskData];
|
| 311 |
+
|
| 312 |
+
// Filter by task or task category
|
| 313 |
+
if (selectedTask !== "all") {
|
| 314 |
+
// Check if it's a category
|
| 315 |
+
const isCategory = Object.keys(taskCategories || {}).includes(
|
| 316 |
+
selectedTask
|
| 317 |
+
);
|
| 318 |
+
|
| 319 |
+
if (isCategory) {
|
| 320 |
+
// Filter by category
|
| 321 |
+
filteredData = filteredData.filter(
|
| 322 |
+
(item) => item.category === selectedTask
|
| 323 |
+
);
|
| 324 |
+
} else {
|
| 325 |
+
// Filter by specific task
|
| 326 |
+
filteredData = filteredData.filter(
|
| 327 |
+
(item) => item.task === selectedTask
|
| 328 |
+
);
|
| 329 |
+
}
|
| 330 |
+
}
|
| 331 |
+
|
| 332 |
+
// For relative view, we need to transform the data
|
| 333 |
+
if (viewMode === "relative") {
|
| 334 |
+
// Transform data for relative view (regardless of grouping type)
|
| 335 |
+
return filteredData.map((item) => {
|
| 336 |
+
// Create a copy of the item
|
| 337 |
+
const newItem = { ...item };
|
| 338 |
+
|
| 339 |
+
// Get all valid model scores for this item
|
| 340 |
+
const modelScores = [];
|
| 341 |
+
models.forEach((model) => {
|
| 342 |
+
if (typeof newItem[model.model] === "number") {
|
| 343 |
+
modelScores.push(newItem[model.model]);
|
| 344 |
+
}
|
| 345 |
+
});
|
| 346 |
+
|
| 347 |
+
// Calculate average if we have scores
|
| 348 |
+
if (modelScores.length > 0) {
|
| 349 |
+
const avgScore =
|
| 350 |
+
modelScores.reduce((sum, score) => sum + score, 0) /
|
| 351 |
+
modelScores.length;
|
| 352 |
+
|
| 353 |
+
// Convert all scores to relative to average
|
| 354 |
+
models.forEach((model) => {
|
| 355 |
+
if (typeof newItem[model.model] === "number") {
|
| 356 |
+
newItem[model.model] = newItem[model.model] - avgScore;
|
| 357 |
+
}
|
| 358 |
+
});
|
| 359 |
+
}
|
| 360 |
+
|
| 361 |
+
return newItem;
|
| 362 |
+
});
|
| 363 |
+
}
|
| 364 |
+
|
| 365 |
+
// For absolute view or if we can't do relative, return filtered data as is
|
| 366 |
+
return filteredData;
|
| 367 |
+
};
|
| 368 |
+
|
| 369 |
+
// Calculate model equity based on current filters
|
| 370 |
+
const calculateModelEquity = () => {
|
| 371 |
+
if (!demographicSummary || !demographicOptions) {
|
| 372 |
+
return models.map((model) => ({
|
| 373 |
+
model: model.model,
|
| 374 |
+
avgGap: 0,
|
| 375 |
+
color: model.color,
|
| 376 |
+
}));
|
| 377 |
+
}
|
| 378 |
+
|
| 379 |
+
// Get task-specific category if needed
|
| 380 |
+
let taskCategory = null;
|
| 381 |
+
let specificTask = null;
|
| 382 |
+
|
| 383 |
+
if (selectedTask !== "all") {
|
| 384 |
+
// Check if it's a category or specific task - improve detection logic
|
| 385 |
+
const isCategory =
|
| 386 |
+
taskCategories && Object.keys(taskCategories).includes(selectedTask);
|
| 387 |
+
|
| 388 |
+
if (isCategory) {
|
| 389 |
+
taskCategory = selectedTask;
|
| 390 |
+
} else {
|
| 391 |
+
specificTask = selectedTask;
|
| 392 |
+
// Find the category for this task
|
| 393 |
+
const taskInfo = taskData.find((t) => t.task === selectedTask);
|
| 394 |
+
if (taskInfo && taskInfo.category) {
|
| 395 |
+
taskCategory = taskInfo.category;
|
| 396 |
+
}
|
| 397 |
+
}
|
| 398 |
+
}
|
| 399 |
+
|
| 400 |
+
// Get task-specific performance data for reference
|
| 401 |
+
const taskPerformanceData = getFilteredPerformanceData();
|
| 402 |
+
|
| 403 |
+
// Build a lookup of model performance by task - with improved error handling
|
| 404 |
+
const taskPerformanceLookup = {};
|
| 405 |
+
let hasTaskSpecificData = false;
|
| 406 |
+
|
| 407 |
+
if (specificTask) {
|
| 408 |
+
// For a specific task, create lookup
|
| 409 |
+
taskPerformanceData.forEach((item) => {
|
| 410 |
+
if (item.task === specificTask) {
|
| 411 |
+
models.forEach((model) => {
|
| 412 |
+
const modelName = model.model;
|
| 413 |
+
const score = item[modelName];
|
| 414 |
+
|
| 415 |
+
if (typeof score === "number" && !isNaN(score)) {
|
| 416 |
+
if (!taskPerformanceLookup[modelName]) {
|
| 417 |
+
taskPerformanceLookup[modelName] = {};
|
| 418 |
+
}
|
| 419 |
+
taskPerformanceLookup[modelName][specificTask] = score;
|
| 420 |
+
hasTaskSpecificData = true;
|
| 421 |
+
}
|
| 422 |
+
});
|
| 423 |
+
}
|
| 424 |
+
});
|
| 425 |
+
} else if (taskCategory) {
|
| 426 |
+
// For a task category, gather all tasks in that category
|
| 427 |
+
taskPerformanceData.forEach((item) => {
|
| 428 |
+
if (item.category === taskCategory) {
|
| 429 |
+
models.forEach((model) => {
|
| 430 |
+
const modelName = model.model;
|
| 431 |
+
const score = item[modelName];
|
| 432 |
+
|
| 433 |
+
if (typeof score === "number" && !isNaN(score)) {
|
| 434 |
+
if (!taskPerformanceLookup[modelName]) {
|
| 435 |
+
taskPerformanceLookup[modelName] = {};
|
| 436 |
+
}
|
| 437 |
+
taskPerformanceLookup[modelName][item.task] = score;
|
| 438 |
+
hasTaskSpecificData = true;
|
| 439 |
+
}
|
| 440 |
+
});
|
| 441 |
+
}
|
| 442 |
+
});
|
| 443 |
+
}
|
| 444 |
+
|
| 445 |
+
return models
|
| 446 |
+
.map((model) => {
|
| 447 |
+
const modelName = model.model;
|
| 448 |
+
const gaps = [];
|
| 449 |
+
|
| 450 |
+
// For each demographic dimension
|
| 451 |
+
Object.keys(demographicOptions).forEach((demo) => {
|
| 452 |
+
// Skip if we're filtering to a specific demographic and this isn't it
|
| 453 |
+
if (selectedDemographic !== "all" && demo !== selectedDemographic) {
|
| 454 |
+
return;
|
| 455 |
+
}
|
| 456 |
+
|
| 457 |
+
const demoValues = demographicOptions[demo];
|
| 458 |
+
if (!demoValues || demoValues.length < 2) return; // Need at least 2 groups to measure a gap
|
| 459 |
+
|
| 460 |
+
// Get scores for each demographic value within this dimension
|
| 461 |
+
const demoScores = [];
|
| 462 |
+
|
| 463 |
+
demoValues.forEach((value) => {
|
| 464 |
+
// First check if we have demographic data for this model and value
|
| 465 |
+
const modelDemoData =
|
| 466 |
+
demographicSummary[demo]?.[value]?.models?.[modelName];
|
| 467 |
+
if (!modelDemoData) return;
|
| 468 |
+
|
| 469 |
+
let score = null;
|
| 470 |
+
|
| 471 |
+
if (selectedMetric === "overall_score") {
|
| 472 |
+
// Improved logic for task-specific scores
|
| 473 |
+
if (
|
| 474 |
+
specificTask &&
|
| 475 |
+
taskPerformanceLookup[modelName] &&
|
| 476 |
+
typeof taskPerformanceLookup[modelName][specificTask] ===
|
| 477 |
+
"number"
|
| 478 |
+
) {
|
| 479 |
+
// Use the specific task score for all demographic groups
|
| 480 |
+
// This assumes the task score is the same regardless of demographic
|
| 481 |
+
score = taskPerformanceLookup[modelName][specificTask];
|
| 482 |
+
} else if (
|
| 483 |
+
taskCategory &&
|
| 484 |
+
Object.keys(taskPerformanceLookup[modelName] || {}).length > 0
|
| 485 |
+
) {
|
| 486 |
+
// For a category, average the task scores
|
| 487 |
+
const taskScores = Object.values(
|
| 488 |
+
taskPerformanceLookup[modelName]
|
| 489 |
+
);
|
| 490 |
+
if (taskScores.length > 0) {
|
| 491 |
+
score =
|
| 492 |
+
taskScores.reduce((sum, s) => sum + s, 0) /
|
| 493 |
+
taskScores.length;
|
| 494 |
+
} else {
|
| 495 |
+
// Fallback to overall if we don't have category scores
|
| 496 |
+
score = modelDemoData.overall_score;
|
| 497 |
+
}
|
| 498 |
+
} else {
|
| 499 |
+
// Default to overall score
|
| 500 |
+
score = modelDemoData.overall_score;
|
| 501 |
+
}
|
| 502 |
+
} else if (selectedMetric === "repeat_usage_pct") {
|
| 503 |
+
score = modelDemoData.repeat_usage_pct;
|
| 504 |
+
} else if (selectedMetric.startsWith("facet_")) {
|
| 505 |
+
const facet = selectedMetric.replace("facet_", "");
|
| 506 |
+
if (
|
| 507 |
+
modelDemoData.facet_scores &&
|
| 508 |
+
facet in modelDemoData.facet_scores
|
| 509 |
+
) {
|
| 510 |
+
score = modelDemoData.facet_scores[facet];
|
| 511 |
+
}
|
| 512 |
+
}
|
| 513 |
+
|
| 514 |
+
// Only add valid scores
|
| 515 |
+
if (score !== null && typeof score === "number" && !isNaN(score)) {
|
| 516 |
+
demoScores.push({
|
| 517 |
+
value,
|
| 518 |
+
score,
|
| 519 |
+
});
|
| 520 |
+
}
|
| 521 |
+
});
|
| 522 |
+
|
| 523 |
+
// Calculate gap for this demographic dimension with better error handling
|
| 524 |
+
if (demoScores.length >= 2) {
|
| 525 |
+
const sortedScores = [...demoScores].sort(
|
| 526 |
+
(a, b) => a.score - b.score
|
| 527 |
+
);
|
| 528 |
+
const lowest = sortedScores[0];
|
| 529 |
+
const highest = sortedScores[sortedScores.length - 1];
|
| 530 |
+
const gap = highest.score - lowest.score;
|
| 531 |
+
|
| 532 |
+
// Only include valid gaps
|
| 533 |
+
if (!isNaN(gap)) {
|
| 534 |
+
gaps.push({
|
| 535 |
+
demo,
|
| 536 |
+
gap,
|
| 537 |
+
lowestGroup: lowest.value,
|
| 538 |
+
lowestScore: lowest.score,
|
| 539 |
+
highestGroup: highest.value,
|
| 540 |
+
highestScore: highest.score,
|
| 541 |
+
});
|
| 542 |
+
}
|
| 543 |
+
}
|
| 544 |
+
});
|
| 545 |
+
|
| 546 |
+
// Calculate average gap with better error handling
|
| 547 |
+
const avgGap =
|
| 548 |
+
gaps.length > 0
|
| 549 |
+
? gaps.reduce((sum, g) => sum + g.gap, 0) / gaps.length
|
| 550 |
+
: 0;
|
| 551 |
+
|
| 552 |
+
// For a specific demographic, get the exact gap
|
| 553 |
+
const specificGap =
|
| 554 |
+
selectedDemographic !== "all"
|
| 555 |
+
? gaps.find((g) => g.demo === selectedDemographic)?.gap || 0
|
| 556 |
+
: avgGap;
|
| 557 |
+
|
| 558 |
+
return {
|
| 559 |
+
model: modelName,
|
| 560 |
+
avgGap: selectedDemographic === "all" ? avgGap : specificGap,
|
| 561 |
+
color: model.color,
|
| 562 |
+
gaps,
|
| 563 |
+
};
|
| 564 |
+
})
|
| 565 |
+
.sort((a, b) => a.avgGap - b.avgGap); // Sort by avg gap (lower is better)
|
| 566 |
+
};
|
| 567 |
+
|
| 568 |
+
|
| 569 |
+
// 1. Enhanced generateKeyInsights function that returns structured data objects
|
| 570 |
+
const generateKeyInsights = () => {
|
| 571 |
+
const structuredInsights = [];
|
| 572 |
+
|
| 573 |
+
// Only generate meaningful insights when we have sufficient data
|
| 574 |
+
if (!taskData || !demographicSummary) {
|
| 575 |
+
return ["Not enough data to generate insights."];
|
| 576 |
+
}
|
| 577 |
+
|
| 578 |
+
// Get the filtered data
|
| 579 |
+
const filteredData = getFilteredPerformanceData();
|
| 580 |
+
const equityData = calculateModelEquity();
|
| 581 |
+
|
| 582 |
+
// If we have data for performance comparison
|
| 583 |
+
if (filteredData.length > 0) {
|
| 584 |
+
// Find best performing model for the current filter set
|
| 585 |
+
const bestModel = { model: null, score: -Infinity };
|
| 586 |
+
const worstModel = { model: null, score: Infinity };
|
| 587 |
+
|
| 588 |
+
// Extract scores based on groupBy and selected data
|
| 589 |
+
if (groupBy === "task") {
|
| 590 |
+
// Find best performance across all tasks
|
| 591 |
+
filteredData.forEach((task) => {
|
| 592 |
+
models.forEach((model) => {
|
| 593 |
+
const score = task[model.model];
|
| 594 |
+
if (score !== undefined && score > bestModel.score) {
|
| 595 |
+
bestModel.model = model.model;
|
| 596 |
+
bestModel.score = score;
|
| 597 |
+
bestModel.task = task.task || task.label;
|
| 598 |
+
bestModel.modelObj = model;
|
| 599 |
+
}
|
| 600 |
+
if (score !== undefined && score < worstModel.score) {
|
| 601 |
+
worstModel.model = model.model;
|
| 602 |
+
worstModel.score = score;
|
| 603 |
+
worstModel.task = task.task || task.label;
|
| 604 |
+
worstModel.modelObj = model;
|
| 605 |
+
}
|
| 606 |
+
});
|
| 607 |
+
});
|
| 608 |
+
|
| 609 |
+
// Create contextual insights based on current filters
|
| 610 |
+
if (bestModel.model) {
|
| 611 |
+
let taskContext = bestModel.task;
|
| 612 |
+
let insightTitle = "";
|
| 613 |
+
|
| 614 |
+
if (selectedTask === "all") {
|
| 615 |
+
insightTitle = `Best for ${taskContext}`;
|
| 616 |
+
} else if (Object.keys(taskCategories || {}).includes(selectedTask)) {
|
| 617 |
+
insightTitle = `Best for ${selectedTask} Tasks`;
|
| 618 |
+
} else {
|
| 619 |
+
insightTitle = `Best for ${selectedTask}`;
|
| 620 |
+
taskContext = selectedTask;
|
| 621 |
+
}
|
| 622 |
+
|
| 623 |
+
structuredInsights.push({
|
| 624 |
+
type: "performance",
|
| 625 |
+
model: bestModel.model,
|
| 626 |
+
modelObj: bestModel.modelObj,
|
| 627 |
+
score: bestModel.score,
|
| 628 |
+
task: taskContext,
|
| 629 |
+
title: insightTitle
|
| 630 |
+
});
|
| 631 |
+
}
|
| 632 |
+
|
| 633 |
+
if (bestModel.model && worstModel.model && bestModel.model !== worstModel.model) {
|
| 634 |
+
const gap = bestModel.score - worstModel.score;
|
| 635 |
+
if (gap > 15) { // Only show significant gaps
|
| 636 |
+
structuredInsights.push({
|
| 637 |
+
type: "gap",
|
| 638 |
+
gap: gap,
|
| 639 |
+
model1: bestModel.model,
|
| 640 |
+
model1Obj: bestModel.modelObj,
|
| 641 |
+
model2: worstModel.model,
|
| 642 |
+
model2Obj: worstModel.modelObj,
|
| 643 |
+
context: selectedTask !== "all" ? selectedTask : "across all tasks"
|
| 644 |
+
});
|
| 645 |
+
}
|
| 646 |
+
}
|
| 647 |
+
} else if (groupBy === "demographic" && selectedDemographic !== "all") {
|
| 648 |
+
// Similar logic for demographic insights...
|
| 649 |
+
}
|
| 650 |
+
}
|
| 651 |
+
|
| 652 |
+
// Add equity insights when we have equity data
|
| 653 |
+
if (equityData.length > 0) {
|
| 654 |
+
const mostEquitable = equityData[0];
|
| 655 |
+
const leastEquitable = equityData[equityData.length - 1];
|
| 656 |
+
|
| 657 |
+
// Get model objects
|
| 658 |
+
const mostEquitableModelObj = models.find(m => m.model === mostEquitable.model);
|
| 659 |
+
const leastEquitableModelObj = models.find(m => m.model === leastEquitable.model);
|
| 660 |
+
|
| 661 |
+
// Only show equity insights if there's a meaningful difference
|
| 662 |
+
if (mostEquitable.avgGap < 10 && (leastEquitable.avgGap - mostEquitable.avgGap > 10)) {
|
| 663 |
+
let demoContext = selectedDemographic === "all" ? "all demographics" : selectedDemographic;
|
| 664 |
+
|
| 665 |
+
structuredInsights.push({
|
| 666 |
+
type: "equity",
|
| 667 |
+
model: mostEquitable.model,
|
| 668 |
+
modelObj: mostEquitableModelObj,
|
| 669 |
+
gap: mostEquitable.avgGap,
|
| 670 |
+
demographic: demoContext,
|
| 671 |
+
task: selectedTask !== "all" ? selectedTask : ""
|
| 672 |
+
});
|
| 673 |
+
}
|
| 674 |
+
|
| 675 |
+
if (leastEquitable.avgGap > 20) {
|
| 676 |
+
let demoContext = selectedDemographic === "all" ? "demographic groups" : `${selectedDemographic} groups`;
|
| 677 |
+
|
| 678 |
+
structuredInsights.push({
|
| 679 |
+
type: "concern",
|
| 680 |
+
model: leastEquitable.model,
|
| 681 |
+
modelObj: leastEquitableModelObj,
|
| 682 |
+
gap: leastEquitable.avgGap,
|
| 683 |
+
demographic: demoContext,
|
| 684 |
+
task: selectedTask !== "all" ? selectedTask : ""
|
| 685 |
+
});
|
| 686 |
+
}
|
| 687 |
+
}
|
| 688 |
+
|
| 689 |
+
return structuredInsights.length > 0 ? structuredInsights :
|
| 690 |
+
[{ type: "info", message: "Try different filter combinations to discover more insights." }];
|
| 691 |
+
};
|
| 692 |
+
|
| 693 |
+
// 2. Improved Key Insight Card component
|
| 694 |
+
const KeyInsightCard = ({ insight }) => {
|
| 695 |
+
// Determine card styling based on insight type
|
| 696 |
+
const getCardConfig = () => {
|
| 697 |
+
switch (insight.type) {
|
| 698 |
+
case "performance":
|
| 699 |
+
return {
|
| 700 |
+
backgroundColor: "bg-white",
|
| 701 |
+
dotColor: "bg-indigo-500",
|
| 702 |
+
icon: "🏆",
|
| 703 |
+
title: insight.title || "Top Performer"
|
| 704 |
+
};
|
| 705 |
+
case "equity":
|
| 706 |
+
return {
|
| 707 |
+
backgroundColor: "bg-white",
|
| 708 |
+
dotColor: "bg-purple-500",
|
| 709 |
+
icon: "⚖️",
|
| 710 |
+
title: "Equity Champion"
|
| 711 |
+
};
|
| 712 |
+
case "gap":
|
| 713 |
+
return {
|
| 714 |
+
backgroundColor: "bg-white",
|
| 715 |
+
dotColor: "bg-amber-500",
|
| 716 |
+
icon: "📊",
|
| 717 |
+
title: "Performance Gap"
|
| 718 |
+
};
|
| 719 |
+
case "concern":
|
| 720 |
+
return {
|
| 721 |
+
backgroundColor: "bg-white",
|
| 722 |
+
dotColor: "bg-red-500",
|
| 723 |
+
icon: "⚠️",
|
| 724 |
+
title: "Potential Concern"
|
| 725 |
+
};
|
| 726 |
+
default:
|
| 727 |
+
return {
|
| 728 |
+
backgroundColor: "bg-white",
|
| 729 |
+
dotColor: "bg-gray-500",
|
| 730 |
+
icon: "ℹ️",
|
| 731 |
+
title: "Note"
|
| 732 |
+
};
|
| 733 |
+
}
|
| 734 |
+
};
|
| 735 |
+
|
| 736 |
+
const config = getCardConfig();
|
| 737 |
+
|
| 738 |
+
return (
|
| 739 |
+
<div className={`border rounded-lg overflow-hidden ${config.backgroundColor}`}>
|
| 740 |
+
{/* Card Header */}
|
| 741 |
+
<div className="border-b bg-white px-4 py-2">
|
| 742 |
+
<h4 className="font-medium text-gray-800 flex items-center">
|
| 743 |
+
<span className={`w-3 h-3 rounded-full ${config.dotColor} mr-2`}></span>
|
| 744 |
+
{config.title}
|
| 745 |
+
</h4>
|
| 746 |
+
</div>
|
| 747 |
+
|
| 748 |
+
{/* Card Content */}
|
| 749 |
+
<div className="p-4">
|
| 750 |
+
{/* Performance Card */}
|
| 751 |
+
{insight.type === "performance" && (
|
| 752 |
+
<div className="flex items-center">
|
| 753 |
+
<div className={`h-10 w-10 text-2xl rounded-full flex items-center justify-center mr-3`}>
|
| 754 |
+
{config.icon}
|
| 755 |
+
</div>
|
| 756 |
+
<div>
|
| 757 |
+
<div className="font-medium" style={{ color: insight.modelObj?.color || '#6B7280' }}>
|
| 758 |
+
{insight.model}
|
| 759 |
+
</div>
|
| 760 |
+
<div className="text-sm text-gray-600">
|
| 761 |
+
Score: {insight.score.toFixed(1)}
|
| 762 |
+
</div>
|
| 763 |
+
</div>
|
| 764 |
+
</div>
|
| 765 |
+
)}
|
| 766 |
+
|
| 767 |
+
{/* Equity Card */}
|
| 768 |
+
{insight.type === "equity" && (
|
| 769 |
+
<>
|
| 770 |
+
<div className="flex items-center">
|
| 771 |
+
<div className={`h-10 w-10 text-2xl rounded-full flex items-center justify-center mr-3`}>
|
| 772 |
+
{config.icon}
|
| 773 |
+
</div>
|
| 774 |
+
<div>
|
| 775 |
+
<div className="font-medium" style={{ color: insight.modelObj?.color || '#6B7280' }}>
|
| 776 |
+
{insight.model}
|
| 777 |
+
</div>
|
| 778 |
+
<div className="text-sm text-gray-600">
|
| 779 |
+
Equity Gap: {insight.gap.toFixed(1)}
|
| 780 |
+
</div>
|
| 781 |
+
</div>
|
| 782 |
+
</div>
|
| 783 |
+
<div className="mt-3 text-sm">
|
| 784 |
+
Consistent across {insight.demographic}
|
| 785 |
+
</div>
|
| 786 |
+
</>
|
| 787 |
+
)}
|
| 788 |
+
|
| 789 |
+
{/* Gap Card */}
|
| 790 |
+
{insight.type === "gap" && (
|
| 791 |
+
<>
|
| 792 |
+
<div className="flex items-center mb-3">
|
| 793 |
+
<div className={`h-10 w-10 text-2xl rounded-full flex items-center justify-center mr-3`}>
|
| 794 |
+
{config.icon}
|
| 795 |
+
</div>
|
| 796 |
+
<div>
|
| 797 |
+
<div className="font-medium">Gap: {insight.gap.toFixed(1)} points</div>
|
| 798 |
+
</div>
|
| 799 |
+
</div>
|
| 800 |
+
<div className="flex justify-between items-center">
|
| 801 |
+
<div style={{ color: insight.model1Obj?.color || '#6B7280' }} className="font-medium">
|
| 802 |
+
{insight.model1}
|
| 803 |
+
</div>
|
| 804 |
+
<div className="text-gray-500 mx-2">vs</div>
|
| 805 |
+
<div style={{ color: insight.model2Obj?.color || '#6B7280' }} className="font-medium">
|
| 806 |
+
{insight.model2}
|
| 807 |
+
</div>
|
| 808 |
+
</div>
|
| 809 |
+
{insight.context !== "across all tasks" && (
|
| 810 |
+
<div className="mt-2 text-sm text-gray-700">
|
| 811 |
+
on {insight.context}
|
| 812 |
+
</div>
|
| 813 |
+
)}
|
| 814 |
+
</>
|
| 815 |
+
)}
|
| 816 |
+
|
| 817 |
+
{/* Concern Card */}
|
| 818 |
+
{insight.type === "concern" && (
|
| 819 |
+
<>
|
| 820 |
+
<div className="flex items-center">
|
| 821 |
+
<div className={`h-10 w-10 text-2xl rounded-full flex items-center justify-center mr-3`}>
|
| 822 |
+
{config.icon}
|
| 823 |
+
</div>
|
| 824 |
+
<div>
|
| 825 |
+
<div className="font-medium" style={{ color: insight.modelObj?.color || '#6B7280' }}>
|
| 826 |
+
{insight.model}
|
| 827 |
+
</div>
|
| 828 |
+
<div className="text-sm text-gray-600">
|
| 829 |
+
Disparity: {insight.gap.toFixed(1)} points
|
| 830 |
+
</div>
|
| 831 |
+
</div>
|
| 832 |
+
</div>
|
| 833 |
+
<div className="mt-3 text-sm">
|
| 834 |
+
Between {insight.demographic}
|
| 835 |
+
{insight.task && ` on ${insight.task}`}
|
| 836 |
+
</div>
|
| 837 |
+
</>
|
| 838 |
+
)}
|
| 839 |
+
|
| 840 |
+
{/* Info Card */}
|
| 841 |
+
{insight.type === "info" && (
|
| 842 |
+
<div className="text-sm text-gray-700">
|
| 843 |
+
{insight.message}
|
| 844 |
+
</div>
|
| 845 |
+
)}
|
| 846 |
+
</div>
|
| 847 |
+
</div>
|
| 848 |
+
);
|
| 849 |
+
};
|
| 850 |
+
|
| 851 |
+
// 3. Key Insights Panel render function
|
| 852 |
+
const renderKeyInsightsPanel = () => {
|
| 853 |
+
// Get structured insights directly from enhanced function
|
| 854 |
+
const structuredInsights = generateKeyInsights();
|
| 855 |
+
|
| 856 |
+
return (
|
| 857 |
+
<div className="border rounded-lg overflow-hidden mb-6 shadow-sm">
|
| 858 |
+
<div
|
| 859 |
+
className="px-4 py-3 bg-white flex justify-between items-center cursor-pointer"
|
| 860 |
+
onClick={() => setKeyInsightsVisible(!keyInsightsVisible)}
|
| 861 |
+
>
|
| 862 |
+
<h3 className="font-semibold flex items-center text-gray-800">
|
| 863 |
+
<svg xmlns="http://www.w3.org/2000/svg" className="h-5 w-5 mr-2 text-blue-500" viewBox="0 0 20 20" fill="currentColor">
|
| 864 |
+
<path d="M11 3a1 1 0 10-2 0v1a1 1 0 102 0V3zM15.657 5.757a1 1 0 00-1.414-1.414l-.707.707a1 1 0 001.414 1.414l.707-.707zM18 10a1 1 0 01-1 1h-1a1 1 0 110-2h1a1 1 0 011 1zM5.05 6.464A1 1 0 106.464 5.05l-.707-.707a1 1 0 00-1.414 1.414l.707.707zM5 10a1 1 0 01-1 1H3a1 1 0 110-2h1a1 1 0 011 1zM8 16v-1h4v1a2 2 0 11-4 0zM12 14c.015-.34.208-.646.477-.859a4 4 0 10-4.954 0c.27.213.462.519.476.859h4.002z" />
|
| 865 |
+
</svg>
|
| 866 |
+
Key Insights
|
| 867 |
+
</h3>
|
| 868 |
+
<div className="flex items-center">
|
| 869 |
+
{structuredInsights.length > 0 && (
|
| 870 |
+
<span className="text-xs bg-blue-500 text-white rounded-full px-2 py-0.5 mr-2">
|
| 871 |
+
{structuredInsights.length}
|
| 872 |
+
</span>
|
| 873 |
+
)}
|
| 874 |
+
<div className="text-gray-500">
|
| 875 |
+
{keyInsightsVisible ? (
|
| 876 |
+
<svg xmlns="http://www.w3.org/2000/svg" className="h-5 w-5" viewBox="0 0 20 20" fill="currentColor">
|
| 877 |
+
<path fillRule="evenodd" d="M5.293 7.293a1 1 0 011.414 0L10 10.586l3.293-3.293a1 1 0 111.414 1.414l-4 4a1 1 0 01-1.414 0l-4-4a1 1 0 010-1.414z" clipRule="evenodd" />
|
| 878 |
+
</svg>
|
| 879 |
+
) : (
|
| 880 |
+
<svg xmlns="http://www.w3.org/2000/svg" className="h-5 w-5" viewBox="0 0 20 20" fill="currentColor">
|
| 881 |
+
<path fillRule="evenodd" d="M14.707 12.707a1 1 0 01-1.414 0L10 9.414l-3.293 3.293a1 1 0 01-1.414-1.414l4-4a1 1 0 011.414 0l4 4a1 1 0 010 1.414z" clipRule="evenodd" />
|
| 882 |
+
</svg>
|
| 883 |
+
)}
|
| 884 |
+
</div>
|
| 885 |
+
</div>
|
| 886 |
+
</div>
|
| 887 |
+
{keyInsightsVisible && (
|
| 888 |
+
<div className="p-4">
|
| 889 |
+
{structuredInsights.length > 0 && structuredInsights[0].type !== "info" ? (
|
| 890 |
+
<div className="grid grid-cols-1 md:grid-cols-2 gap-4">
|
| 891 |
+
{structuredInsights.map((insight, index) => (
|
| 892 |
+
<KeyInsightCard key={index} insight={insight} />
|
| 893 |
+
))}
|
| 894 |
+
</div>
|
| 895 |
+
) : (
|
| 896 |
+
<div className="py-6 text-center text-gray-500">
|
| 897 |
+
<svg xmlns="http://www.w3.org/2000/svg" className="h-8 w-8 mx-auto mb-2 text-gray-400" fill="none" viewBox="0 0 24 24" stroke="currentColor">
|
| 898 |
+
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 16h-1v-4h-1m1-4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
|
| 899 |
+
</svg>
|
| 900 |
+
<p>{structuredInsights[0].message || "No insights available for current filter selection"}</p>
|
| 901 |
+
<p className="text-sm mt-1">Try adjusting your filters to see insights</p>
|
| 902 |
+
</div>
|
| 903 |
+
)}
|
| 904 |
+
</div>
|
| 905 |
+
)}
|
| 906 |
+
</div>
|
| 907 |
+
);
|
| 908 |
+
};
|
| 909 |
+
|
| 910 |
+
// Get data for visualization
|
| 911 |
+
const performanceData = getFilteredPerformanceData();
|
| 912 |
+
const equityRankings = calculateModelEquity();
|
| 913 |
+
const keyInsights = generateKeyInsights();
|
| 914 |
+
|
| 915 |
+
// Custom tooltip for the bar chart
|
| 916 |
+
const PerformanceTooltip = ({ active, payload, label }) => {
|
| 917 |
+
if (active && payload && payload.length) {
|
| 918 |
+
return (
|
| 919 |
+
<div className="bg-white p-3 border rounded shadow-sm">
|
| 920 |
+
<p className="font-medium">{label}</p>
|
| 921 |
+
<div className="mt-2">
|
| 922 |
+
{payload.map((entry, index) => {
|
| 923 |
+
// Skip entries that don't have model data
|
| 924 |
+
if (!entry.name || entry.name.includes("_std")) return null;
|
| 925 |
+
|
| 926 |
+
// Find the corresponding standard deviation if available
|
| 927 |
+
const stdKey = `${entry.name}_std`;
|
| 928 |
+
const stdEntry = payload.find((p) => p.dataKey === stdKey);
|
| 929 |
+
const stdValue = stdEntry ? stdEntry.value : 0;
|
| 930 |
+
|
| 931 |
+
return (
|
| 932 |
+
<div key={index} className="flex items-center text-sm mb-1">
|
| 933 |
+
<div
|
| 934 |
+
className="w-3 h-3 rounded-full mr-1"
|
| 935 |
+
style={{ backgroundColor: entry.color }}
|
| 936 |
+
></div>
|
| 937 |
+
<span className="mr-2">{entry.name}:</span>
|
| 938 |
+
<span className="font-medium">
|
| 939 |
+
{entry.value.toFixed(2)}{" "}
|
| 940 |
+
{stdValue ? `± ${stdValue.toFixed(2)}` : ""}
|
| 941 |
+
</span>
|
| 942 |
+
</div>
|
| 943 |
+
);
|
| 944 |
+
})}
|
| 945 |
+
</div>
|
| 946 |
+
</div>
|
| 947 |
+
);
|
| 948 |
+
}
|
| 949 |
+
return null;
|
| 950 |
+
};
|
| 951 |
+
|
| 952 |
+
// Get formatted metric name
|
| 953 |
+
const getMetricName = (metric) => {
|
| 954 |
+
if (metric === "overall_score") return "Overall Score";
|
| 955 |
+
if (metric === "repeat_usage_pct") return "Would Use Again";
|
| 956 |
+
if (metric.startsWith("facet_")) {
|
| 957 |
+
const facet = metric.replace("facet_", "");
|
| 958 |
+
return formatFacetName(facet);
|
| 959 |
+
}
|
| 960 |
+
return metric;
|
| 961 |
+
};
|
| 962 |
+
|
| 963 |
+
return (
|
| 964 |
+
<div>
|
| 965 |
+
{/* Analysis Controls Panel */}
|
| 966 |
+
<div className="border rounded-lg overflow-hidden mb-6">
|
| 967 |
+
<div className="px-4 py-2 bg-gray-50 border-b">
|
| 968 |
+
<h3 className="font-semibold">Analysis Controls</h3>
|
| 969 |
+
</div>
|
| 970 |
+
<div className="p-4 grid grid-cols-1 md:grid-cols-3 gap-4">
|
| 971 |
+
<div>
|
| 972 |
+
<label className="block text-sm font-medium text-gray-700 mb-2">
|
| 973 |
+
Group By
|
| 974 |
+
</label>
|
| 975 |
+
<select
|
| 976 |
+
className="w-full border rounded-md px-3 py-2 bg-white shadow-sm focus:outline-none focus:ring-2 focus:ring-blue-500"
|
| 977 |
+
value={groupBy}
|
| 978 |
+
onChange={(e) => setGroupBy(e.target.value)}
|
| 979 |
+
>
|
| 980 |
+
<option value="task">Task</option>
|
| 981 |
+
<option value="demographic">Demographic</option>
|
| 982 |
+
<option value="combined">Task × Demographic</option>
|
| 983 |
+
</select>
|
| 984 |
+
</div>
|
| 985 |
+
|
| 986 |
+
<div>
|
| 987 |
+
<label className="block text-sm font-medium text-gray-700 mb-2">
|
| 988 |
+
Task
|
| 989 |
+
</label>
|
| 990 |
+
<select
|
| 991 |
+
className={`w-full border rounded-md px-3 py-2 shadow-sm focus:outline-none focus:ring-2 focus:ring-blue-500 ${
|
| 992 |
+
groupBy === "demographic"
|
| 993 |
+
? "bg-gray-100 text-gray-500"
|
| 994 |
+
: "bg-white"
|
| 995 |
+
}`}
|
| 996 |
+
value={selectedTask}
|
| 997 |
+
onChange={(e) => setSelectedTask(e.target.value)}
|
| 998 |
+
disabled={groupBy === "demographic"}
|
| 999 |
+
>
|
| 1000 |
+
{/* Show "All Tasks" at the top */}
|
| 1001 |
+
<option value={taskOptions.allTasksOption.value}>
|
| 1002 |
+
{taskOptions.allTasksOption.label}
|
| 1003 |
+
</option>
|
| 1004 |
+
|
| 1005 |
+
{/* Show categories at second level */}
|
| 1006 |
+
<optgroup label="Task Categories">
|
| 1007 |
+
{taskOptions.categories.map((category) => (
|
| 1008 |
+
<option key={category.value} value={category.value}>
|
| 1009 |
+
{category.label}
|
| 1010 |
+
</option>
|
| 1011 |
+
))}
|
| 1012 |
+
</optgroup>
|
| 1013 |
+
|
| 1014 |
+
{/* Show tasks grouped by category */}
|
| 1015 |
+
{Object.entries(taskOptions.categorizedTasks).map(
|
| 1016 |
+
([category, tasks]) => (
|
| 1017 |
+
<optgroup
|
| 1018 |
+
key={category}
|
| 1019 |
+
label={`${
|
| 1020 |
+
category.charAt(0).toUpperCase() + category.slice(1)
|
| 1021 |
+
} Tasks`}
|
| 1022 |
+
>
|
| 1023 |
+
{tasks.map((task) => (
|
| 1024 |
+
<option key={task.value} value={task.value}>
|
| 1025 |
+
{task.label}
|
| 1026 |
+
</option>
|
| 1027 |
+
))}
|
| 1028 |
+
</optgroup>
|
| 1029 |
+
)
|
| 1030 |
+
)}
|
| 1031 |
+
|
| 1032 |
+
{/* Show uncategorized tasks if any */}
|
| 1033 |
+
{taskOptions.uncategorizedTasks.length > 0 && (
|
| 1034 |
+
<optgroup label="Other Tasks">
|
| 1035 |
+
{taskOptions.uncategorizedTasks.map((task) => (
|
| 1036 |
+
<option key={task.value} value={task.value}>
|
| 1037 |
+
{task.label}
|
| 1038 |
+
</option>
|
| 1039 |
+
))}
|
| 1040 |
+
</optgroup>
|
| 1041 |
+
)}
|
| 1042 |
+
</select>
|
| 1043 |
+
</div>
|
| 1044 |
+
|
| 1045 |
+
<div>
|
| 1046 |
+
<label className="block text-sm font-medium text-gray-700 mb-2">
|
| 1047 |
+
Demographic Dimension
|
| 1048 |
+
{groupBy === "task" && (
|
| 1049 |
+
<span className="ml-2 text-xs text-gray-500">
|
| 1050 |
+
(Disabled when grouping by task)
|
| 1051 |
+
</span>
|
| 1052 |
+
)}
|
| 1053 |
+
</label>
|
| 1054 |
+
<select
|
| 1055 |
+
className={`w-full border rounded-md px-3 py-2 shadow-sm focus:outline-none focus:ring-2 focus:ring-blue-500 ${
|
| 1056 |
+
groupBy === "task" ? "bg-gray-100 text-gray-500" : "bg-white"
|
| 1057 |
+
}`}
|
| 1058 |
+
value={selectedDemographic}
|
| 1059 |
+
onChange={(e) => setSelectedDemographic(e.target.value)}
|
| 1060 |
+
disabled={groupBy === "task"}
|
| 1061 |
+
>
|
| 1062 |
+
<option value="all">All Demographics (Average)</option>
|
| 1063 |
+
{Object.keys(demographicOptions || {}).map((demo) => (
|
| 1064 |
+
<option key={demo} value={demo}>
|
| 1065 |
+
{demo.charAt(0).toUpperCase() + demo.slice(1)}
|
| 1066 |
+
</option>
|
| 1067 |
+
))}
|
| 1068 |
+
</select>
|
| 1069 |
+
</div>
|
| 1070 |
+
|
| 1071 |
+
<div>
|
| 1072 |
+
<label className="block text-sm font-medium text-gray-700 mb-2">
|
| 1073 |
+
Metric
|
| 1074 |
+
</label>
|
| 1075 |
+
<select
|
| 1076 |
+
className="w-full border rounded-md px-3 py-2 bg-white shadow-sm focus:outline-none focus:ring-2 focus:ring-blue-500"
|
| 1077 |
+
value={selectedMetric}
|
| 1078 |
+
onChange={(e) => setSelectedMetric(e.target.value)}
|
| 1079 |
+
>
|
| 1080 |
+
<option value="overall_score">Overall Score</option>
|
| 1081 |
+
<option value="repeat_usage_pct">Would Use Again (%)</option>
|
| 1082 |
+
{Object.keys(facets || {})
|
| 1083 |
+
.filter((f) => f !== "repeat_usage")
|
| 1084 |
+
.map((facet) => (
|
| 1085 |
+
<option key={facet} value={`facet_${facet}`}>
|
| 1086 |
+
{formatFacetName(facet)}
|
| 1087 |
+
</option>
|
| 1088 |
+
))}
|
| 1089 |
+
</select>
|
| 1090 |
+
</div>
|
| 1091 |
+
|
| 1092 |
+
<div>
|
| 1093 |
+
<label className="block text-sm font-medium text-gray-700 mb-2">
|
| 1094 |
+
Model
|
| 1095 |
+
</label>
|
| 1096 |
+
<select
|
| 1097 |
+
className="w-full border rounded-md px-3 py-2 bg-white shadow-sm focus:outline-none focus:ring-2 focus:ring-blue-500"
|
| 1098 |
+
value={selectedModel || ""}
|
| 1099 |
+
onChange={(e) => setSelectedModel(e.target.value)}
|
| 1100 |
+
>
|
| 1101 |
+
{models.map((model) => (
|
| 1102 |
+
<option key={model.model} value={model.model}>
|
| 1103 |
+
{model.model}
|
| 1104 |
+
</option>
|
| 1105 |
+
))}
|
| 1106 |
+
</select>
|
| 1107 |
+
</div>
|
| 1108 |
+
|
| 1109 |
+
<div>
|
| 1110 |
+
<label className="block text-sm font-medium text-gray-700 mb-2">
|
| 1111 |
+
Display Options
|
| 1112 |
+
</label>
|
| 1113 |
+
<div className="flex flex-wrap gap-2">
|
| 1114 |
+
<button
|
| 1115 |
+
className={`px-3 py-1 text-xs font-medium rounded ${
|
| 1116 |
+
showAllModels
|
| 1117 |
+
? "bg-blue-100 text-blue-800 border border-blue-300"
|
| 1118 |
+
: "bg-gray-100 text-gray-800 border border-gray-300"
|
| 1119 |
+
}`}
|
| 1120 |
+
onClick={() => setShowAllModels(true)}
|
| 1121 |
+
>
|
| 1122 |
+
All Models
|
| 1123 |
+
</button>
|
| 1124 |
+
<button
|
| 1125 |
+
className={`px-3 py-1 text-xs font-medium rounded ${
|
| 1126 |
+
!showAllModels
|
| 1127 |
+
? "bg-blue-100 text-blue-800 border border-blue-300"
|
| 1128 |
+
: "bg-gray-100 text-gray-800 border border-gray-300"
|
| 1129 |
+
}`}
|
| 1130 |
+
onClick={() => setShowAllModels(false)}
|
| 1131 |
+
>
|
| 1132 |
+
Selected Only
|
| 1133 |
+
</button>
|
| 1134 |
+
<button
|
| 1135 |
+
className={`px-3 py-1 text-xs font-medium rounded ${
|
| 1136 |
+
viewMode === "absolute"
|
| 1137 |
+
? "bg-blue-100 text-blue-800 border border-blue-300"
|
| 1138 |
+
: "bg-gray-100 text-gray-800 border border-gray-300"
|
| 1139 |
+
}`}
|
| 1140 |
+
onClick={() => setViewMode("absolute")}
|
| 1141 |
+
>
|
| 1142 |
+
Absolute
|
| 1143 |
+
</button>
|
| 1144 |
+
<button
|
| 1145 |
+
className={`px-3 py-1 text-xs font-medium rounded ${
|
| 1146 |
+
viewMode === "relative"
|
| 1147 |
+
? "bg-blue-100 text-blue-800 border border-blue-300"
|
| 1148 |
+
: "bg-gray-100 text-gray-800 border border-gray-300"
|
| 1149 |
+
}`}
|
| 1150 |
+
onClick={() => setViewMode("relative")}
|
| 1151 |
+
title="Show performance relative to the average across models"
|
| 1152 |
+
>
|
| 1153 |
+
Relative
|
| 1154 |
+
</button>
|
| 1155 |
+
</div>
|
| 1156 |
+
</div>
|
| 1157 |
+
</div>
|
| 1158 |
+
</div>
|
| 1159 |
+
|
| 1160 |
+
{/* Active Filters Display */}
|
| 1161 |
+
<div className="mb-6">
|
| 1162 |
+
<div className="text-sm font-medium text-gray-700 mb-2">
|
| 1163 |
+
Active Filters:
|
| 1164 |
+
</div>
|
| 1165 |
+
<div className="flex flex-wrap">
|
| 1166 |
+
{selectedTask !== "all" && (
|
| 1167 |
+
<FilterTag
|
| 1168 |
+
label={`Task: ${getTaskLabel(selectedTask)}`}
|
| 1169 |
+
onRemove={() => setSelectedTask("all")}
|
| 1170 |
+
/>
|
| 1171 |
+
)}
|
| 1172 |
+
{selectedDemographic !== "all" && (
|
| 1173 |
+
<FilterTag
|
| 1174 |
+
label={`Demographic: ${
|
| 1175 |
+
selectedDemographic.charAt(0).toUpperCase() +
|
| 1176 |
+
selectedDemographic.slice(1)
|
| 1177 |
+
}`}
|
| 1178 |
+
onRemove={() => setSelectedDemographic("all")}
|
| 1179 |
+
/>
|
| 1180 |
+
)}
|
| 1181 |
+
{!showAllModels && (
|
| 1182 |
+
<FilterTag
|
| 1183 |
+
label={`Model: ${selectedModel}`}
|
| 1184 |
+
onRemove={() => setShowAllModels(true)}
|
| 1185 |
+
/>
|
| 1186 |
+
)}
|
| 1187 |
+
<FilterTag label={`Metric: ${getMetricName(selectedMetric)}`} />
|
| 1188 |
+
<FilterTag
|
| 1189 |
+
label={`Group by: ${
|
| 1190 |
+
groupBy.charAt(0).toUpperCase() + groupBy.slice(1)
|
| 1191 |
+
}`}
|
| 1192 |
+
/>
|
| 1193 |
+
</div>
|
| 1194 |
+
</div>
|
| 1195 |
+
|
| 1196 |
+
{/* Key Insights Panel */}
|
| 1197 |
+
{renderKeyInsightsPanel()}
|
| 1198 |
+
|
| 1199 |
+
{/* Performance Comparison Visualization */}
|
| 1200 |
+
<div className="border rounded-lg overflow-hidden mb-6">
|
| 1201 |
+
<div className="px-4 py-2 bg-gray-50 border-b">
|
| 1202 |
+
<h3 className="font-semibold">
|
| 1203 |
+
{getMetricName(selectedMetric)} by{" "}
|
| 1204 |
+
{groupBy.charAt(0).toUpperCase() + groupBy.slice(1)}
|
| 1205 |
+
{viewMode === "relative" && " (Relative to Average)"}
|
| 1206 |
+
</h3>
|
| 1207 |
+
</div>
|
| 1208 |
+
<div className="p-4">
|
| 1209 |
+
{performanceData.length > 0 ? (
|
| 1210 |
+
<div className="h-96">
|
| 1211 |
+
<ResponsiveContainer width="100%" height="100%">
|
| 1212 |
+
<BarChart
|
| 1213 |
+
data={performanceData}
|
| 1214 |
+
layout="vertical"
|
| 1215 |
+
margin={{ top: 20, right: 30, left: 0, bottom: 5 }}
|
| 1216 |
+
>
|
| 1217 |
+
<CartesianGrid strokeDasharray="3 3" />
|
| 1218 |
+
<XAxis
|
| 1219 |
+
type="number"
|
| 1220 |
+
domain={
|
| 1221 |
+
viewMode === "relative"
|
| 1222 |
+
? // For relative mode, use symmetrical domain based on max deviation
|
| 1223 |
+
(dataMax) => {
|
| 1224 |
+
// Find max absolute deviation
|
| 1225 |
+
const maxDev = performanceData.reduce(
|
| 1226 |
+
(max, item) => {
|
| 1227 |
+
let itemMax = max;
|
| 1228 |
+
models.forEach((model) => {
|
| 1229 |
+
if (typeof item[model.model] === "number") {
|
| 1230 |
+
itemMax = Math.max(
|
| 1231 |
+
itemMax,
|
| 1232 |
+
Math.abs(item[model.model])
|
| 1233 |
+
);
|
| 1234 |
+
}
|
| 1235 |
+
});
|
| 1236 |
+
return itemMax;
|
| 1237 |
+
},
|
| 1238 |
+
0
|
| 1239 |
+
);
|
| 1240 |
+
// Round up to nearest 5
|
| 1241 |
+
const scaledMax = Math.ceil(maxDev / 5) * 5;
|
| 1242 |
+
// Use symmetrical domain
|
| 1243 |
+
return [-scaledMax, scaledMax];
|
| 1244 |
+
}
|
| 1245 |
+
: // For absolute mode, use original scale range
|
| 1246 |
+
selectedMetric.startsWith("facet_")
|
| 1247 |
+
? [-100, 100]
|
| 1248 |
+
: [0, 100]
|
| 1249 |
+
}
|
| 1250 |
+
tickFormatter={(value) => {
|
| 1251 |
+
if (viewMode === "relative") {
|
| 1252 |
+
return value > 0
|
| 1253 |
+
? `+${value.toFixed(0)}`
|
| 1254 |
+
: value.toFixed(0);
|
| 1255 |
+
}
|
| 1256 |
+
return value.toFixed(0);
|
| 1257 |
+
}}
|
| 1258 |
+
/>
|
| 1259 |
+
<YAxis
|
| 1260 |
+
dataKey={groupBy === "task" ? "task" : "label"}
|
| 1261 |
+
type="category"
|
| 1262 |
+
width={150}
|
| 1263 |
+
tick={{ fontSize: 12 }}
|
| 1264 |
+
/>
|
| 1265 |
+
<Tooltip content={<PerformanceTooltip />} />
|
| 1266 |
+
<Legend />
|
| 1267 |
+
{(showAllModels
|
| 1268 |
+
? models
|
| 1269 |
+
: [models.find((m) => m.model === selectedModel)].filter(
|
| 1270 |
+
Boolean
|
| 1271 |
+
)
|
| 1272 |
+
).map((model) => (
|
| 1273 |
+
<Bar
|
| 1274 |
+
key={model.model}
|
| 1275 |
+
dataKey={model.model}
|
| 1276 |
+
name={model.model}
|
| 1277 |
+
fill={model.color}
|
| 1278 |
+
maxBarSize={25}
|
| 1279 |
+
>
|
| 1280 |
+
{viewMode === "relative" &&
|
| 1281 |
+
performanceData.map((entry, index) => {
|
| 1282 |
+
const value = entry[model.model];
|
| 1283 |
+
return (
|
| 1284 |
+
<Cell
|
| 1285 |
+
key={`cell-${index}`}
|
| 1286 |
+
fill={
|
| 1287 |
+
value >= 0 ? model.color : `${model.color}80`
|
| 1288 |
+
} // Lighter shade for negative values
|
| 1289 |
+
/>
|
| 1290 |
+
);
|
| 1291 |
+
})}
|
| 1292 |
+
</Bar>
|
| 1293 |
+
))}
|
| 1294 |
+
{viewMode === "relative" && (
|
| 1295 |
+
<ReferenceLine x={0} stroke="#666" strokeDasharray="3 3" />
|
| 1296 |
+
)}
|
| 1297 |
+
</BarChart>
|
| 1298 |
+
</ResponsiveContainer>
|
| 1299 |
+
</div>
|
| 1300 |
+
) : (
|
| 1301 |
+
<div className="flex items-center justify-center h-60 bg-gray-50 rounded">
|
| 1302 |
+
<div className="text-center p-4">
|
| 1303 |
+
<svg
|
| 1304 |
+
xmlns="http://www.w3.org/2000/svg"
|
| 1305 |
+
className="h-10 w-10 mx-auto text-gray-400 mb-3"
|
| 1306 |
+
fill="none"
|
| 1307 |
+
viewBox="0 0 24 24"
|
| 1308 |
+
stroke="currentColor"
|
| 1309 |
+
>
|
| 1310 |
+
<path
|
| 1311 |
+
strokeLinecap="round"
|
| 1312 |
+
strokeLinejoin="round"
|
| 1313 |
+
strokeWidth={2}
|
| 1314 |
+
d="M13 16h-1v-4h-1m1-4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z"
|
| 1315 |
+
/>
|
| 1316 |
+
</svg>
|
| 1317 |
+
<h3 className="text-lg font-medium text-gray-900 mb-1">
|
| 1318 |
+
No Data Available
|
| 1319 |
+
</h3>
|
| 1320 |
+
<p className="text-sm text-gray-600">
|
| 1321 |
+
There is no data available for the selected filters. Try
|
| 1322 |
+
adjusting your selections.
|
| 1323 |
+
</p>
|
| 1324 |
+
{groupBy === "combined" && (
|
| 1325 |
+
<p className="text-sm text-gray-600 mt-2">
|
| 1326 |
+
Note: Task × Demographic view requires specific data that
|
| 1327 |
+
may not be available.
|
| 1328 |
+
</p>
|
| 1329 |
+
)}
|
| 1330 |
+
</div>
|
| 1331 |
+
</div>
|
| 1332 |
+
)}
|
| 1333 |
+
<div className="mt-4 text-sm text-gray-600 text-center">
|
| 1334 |
+
{viewMode === "absolute"
|
| 1335 |
+
? `${getMetricName(selectedMetric)} by ${groupBy}`
|
| 1336 |
+
: `Performance relative to average across models (positive is better than average)`}
|
| 1337 |
+
</div>
|
| 1338 |
+
</div>
|
| 1339 |
+
</div>
|
| 1340 |
+
|
| 1341 |
+
{/* Model Equity Rankings */}
|
| 1342 |
+
<div className="border rounded-lg overflow-hidden mb-6">
|
| 1343 |
+
<div className="px-4 py-2 bg-gray-50 border-b flex justify-between items-center">
|
| 1344 |
+
<h3 className="font-semibold">Model Equity Rankings</h3>
|
| 1345 |
+
<span className="text-xs text-gray-500">
|
| 1346 |
+
Lower gaps indicate more consistent performance across demographic
|
| 1347 |
+
groups
|
| 1348 |
+
</span>
|
| 1349 |
+
</div>
|
| 1350 |
+
<div className="p-4">
|
| 1351 |
+
<div className="space-y-3">
|
| 1352 |
+
{equityRankings.map((model, index) => {
|
| 1353 |
+
const pct = 100 - (model.avgGap / 30) * 100; // Scale to percentage where 100% = perfect equity
|
| 1354 |
+
return (
|
| 1355 |
+
<div key={model.model} className="relative">
|
| 1356 |
+
<div className="flex items-center mb-1">
|
| 1357 |
+
<div className="w-6 text-sm text-gray-500">
|
| 1358 |
+
{index + 1}.
|
| 1359 |
+
</div>
|
| 1360 |
+
<div
|
| 1361 |
+
className="w-8 h-8 flex items-center justify-center rounded-full mr-2"
|
| 1362 |
+
style={{ backgroundColor: model.color }}
|
| 1363 |
+
>
|
| 1364 |
+
<span className="text-white font-bold text-xs">
|
| 1365 |
+
{index + 1}
|
| 1366 |
+
</span>
|
| 1367 |
+
</div>
|
| 1368 |
+
<span className="text-sm font-medium mr-2">
|
| 1369 |
+
{model.model}
|
| 1370 |
+
</span>
|
| 1371 |
+
<span
|
| 1372 |
+
className={`ml-auto px-2 py-1 text-xs font-semibold rounded-full ${
|
| 1373 |
+
model.avgGap < 10
|
| 1374 |
+
? "bg-green-100 text-green-800"
|
| 1375 |
+
: model.avgGap < 20
|
| 1376 |
+
? "bg-blue-100 text-blue-800"
|
| 1377 |
+
: "bg-yellow-100 text-yellow-800"
|
| 1378 |
+
}`}
|
| 1379 |
+
>
|
| 1380 |
+
{model.avgGap.toFixed(2)} avg gap
|
| 1381 |
+
</span>
|
| 1382 |
+
</div>
|
| 1383 |
+
<div className="h-2 w-full bg-gray-200 rounded-full overflow-hidden">
|
| 1384 |
+
<div
|
| 1385 |
+
className="h-full rounded-full"
|
| 1386 |
+
style={{
|
| 1387 |
+
width: `${Math.min(100, Math.max(0, pct))}%`,
|
| 1388 |
+
backgroundColor: model.color,
|
| 1389 |
+
}}
|
| 1390 |
+
></div>
|
| 1391 |
+
</div>
|
| 1392 |
+
</div>
|
| 1393 |
+
);
|
| 1394 |
+
})}
|
| 1395 |
+
</div>
|
| 1396 |
+
<div className="mt-4 text-xs text-gray-500 grid grid-cols-3 gap-2">
|
| 1397 |
+
<div className="flex items-center">
|
| 1398 |
+
<div className="w-3 h-3 bg-green-100 mr-1 rounded"></div>
|
| 1399 |
+
<span>< 10: Excellent equity</span>
|
| 1400 |
+
</div>
|
| 1401 |
+
<div className="flex items-center">
|
| 1402 |
+
<div className="w-3 h-3 bg-blue-100 mr-1 rounded"></div>
|
| 1403 |
+
<span>10 - 20: Good equity</span>
|
| 1404 |
+
</div>
|
| 1405 |
+
<div className="flex items-center">
|
| 1406 |
+
<div className="w-3 h-3 bg-yellow-100 mr-1 rounded"></div>
|
| 1407 |
+
<span>> 20: Potential disparity</span>
|
| 1408 |
+
</div>
|
| 1409 |
+
</div>
|
| 1410 |
+
</div>
|
| 1411 |
+
</div>
|
| 1412 |
+
</div>
|
| 1413 |
+
);
|
| 1414 |
+
};
|
| 1415 |
+
|
| 1416 |
+
export default TaskDemographicAnalysis;
|
leaderboard-app/eslint.config.mjs
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import { dirname } from "path";
|
| 2 |
+
import { fileURLToPath } from "url";
|
| 3 |
+
import { FlatCompat } from "@eslint/eslintrc";
|
| 4 |
+
|
| 5 |
+
const __filename = fileURLToPath(import.meta.url);
|
| 6 |
+
const __dirname = dirname(__filename);
|
| 7 |
+
|
| 8 |
+
const compat = new FlatCompat({
|
| 9 |
+
baseDirectory: __dirname,
|
| 10 |
+
});
|
| 11 |
+
|
| 12 |
+
const eslintConfig = [...compat.extends("next/core-web-vitals")];
|
| 13 |
+
|
| 14 |
+
export default eslintConfig;
|
leaderboard-app/jsconfig.json
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"compilerOptions": {
|
| 3 |
+
"paths": {
|
| 4 |
+
"@/*": ["./*"]
|
| 5 |
+
}
|
| 6 |
+
}
|
| 7 |
+
}
|
leaderboard-app/lib/utils.js
ADDED
|
@@ -0,0 +1,205 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
/**
|
| 2 |
+
* Prepares the data for visualization by adding colors and formatting
|
| 3 |
+
* @param {Object} rawData - The raw data from the JSON file
|
| 4 |
+
* @returns {Object} - Processed data ready for visualization
|
| 5 |
+
*/
|
| 6 |
+
export function prepareDataForVisualization(rawData) {
|
| 7 |
+
// Define model colors for consistent visualization
|
| 8 |
+
const MODEL_COLORS = {
|
| 9 |
+
'gpt-4o': '#19AADE',
|
| 10 |
+
'claude-3.7-sonnet': '#4A35C5',
|
| 11 |
+
'deepseek-r1': '#FFA319',
|
| 12 |
+
'o1': '#EF4444',
|
| 13 |
+
'gemini-2.0-flash-001': '#22C55E',
|
| 14 |
+
'llama-3.1-405b-instruct': '#8B5CF6'
|
| 15 |
+
};
|
| 16 |
+
|
| 17 |
+
// Add colors to model data
|
| 18 |
+
const modelsWithColors = rawData.models.map(model => ({
|
| 19 |
+
...model,
|
| 20 |
+
color: MODEL_COLORS[model.model] || '#999999' // Fallback color if not defined
|
| 21 |
+
}));
|
| 22 |
+
|
| 23 |
+
// Create an easier lookup for models by name
|
| 24 |
+
const modelsMap = modelsWithColors.reduce((acc, model) => {
|
| 25 |
+
acc[model.model] = model;
|
| 26 |
+
return acc;
|
| 27 |
+
}, {});
|
| 28 |
+
|
| 29 |
+
// Add best model indicators for each task category
|
| 30 |
+
const taskCategories = { ...rawData.taskCategories };
|
| 31 |
+
const bestModelPerCategory = {};
|
| 32 |
+
|
| 33 |
+
Object.keys(taskCategories).forEach(category => {
|
| 34 |
+
let bestModel = null;
|
| 35 |
+
let highestScore = -Infinity;
|
| 36 |
+
let stdDev = 0;
|
| 37 |
+
|
| 38 |
+
modelsWithColors.forEach(model => {
|
| 39 |
+
if (model.tasks && model.tasks[category] && model.tasks[category] > highestScore) {
|
| 40 |
+
highestScore = model.tasks[category];
|
| 41 |
+
bestModel = model.model;
|
| 42 |
+
stdDev = model.tasks_std?.[category] || 0;
|
| 43 |
+
}
|
| 44 |
+
});
|
| 45 |
+
|
| 46 |
+
bestModelPerCategory[category] = {
|
| 47 |
+
model: bestModel,
|
| 48 |
+
score: highestScore,
|
| 49 |
+
std: stdDev,
|
| 50 |
+
color: MODEL_COLORS[bestModel] || '#999999'
|
| 51 |
+
};
|
| 52 |
+
});
|
| 53 |
+
|
| 54 |
+
// Add best model indicators for each metric group
|
| 55 |
+
const metricGroups = { ...rawData.metricGroups };
|
| 56 |
+
const bestModelPerMetricGroup = {};
|
| 57 |
+
|
| 58 |
+
Object.keys(metricGroups).forEach(group => {
|
| 59 |
+
let bestModel = null;
|
| 60 |
+
let highestScore = -Infinity;
|
| 61 |
+
let stdDev = 0;
|
| 62 |
+
|
| 63 |
+
modelsWithColors.forEach(model => {
|
| 64 |
+
if (model.metric_groups && model.metric_groups[group] && model.metric_groups[group] > highestScore) {
|
| 65 |
+
highestScore = model.metric_groups[group];
|
| 66 |
+
bestModel = model.model;
|
| 67 |
+
stdDev = model.metric_groups_std?.[group] || 0;
|
| 68 |
+
}
|
| 69 |
+
});
|
| 70 |
+
|
| 71 |
+
bestModelPerMetricGroup[group] = {
|
| 72 |
+
model: bestModel,
|
| 73 |
+
score: highestScore,
|
| 74 |
+
std: stdDev,
|
| 75 |
+
color: MODEL_COLORS[bestModel] || '#999999'
|
| 76 |
+
};
|
| 77 |
+
});
|
| 78 |
+
|
| 79 |
+
// Add best model indicators for each facet
|
| 80 |
+
const bestModelPerFacet = {};
|
| 81 |
+
|
| 82 |
+
// Extract facets from the data
|
| 83 |
+
const facets = {};
|
| 84 |
+
if (rawData.facets) {
|
| 85 |
+
// If facets are already provided in the raw data
|
| 86 |
+
Object.assign(facets, rawData.facets);
|
| 87 |
+
} else {
|
| 88 |
+
// Try to extract facets from the radar data
|
| 89 |
+
if (rawData.radarData && rawData.radarData.length > 0) {
|
| 90 |
+
rawData.radarData.forEach(item => {
|
| 91 |
+
if (item.category && item.category !== "Would Use Again") {
|
| 92 |
+
const facetName = item.category.toLowerCase().replace(/\s+/g, '_');
|
| 93 |
+
facets[facetName] = [];
|
| 94 |
+
}
|
| 95 |
+
});
|
| 96 |
+
}
|
| 97 |
+
}
|
| 98 |
+
|
| 99 |
+
// Find best model for each facet
|
| 100 |
+
Object.keys(facets).forEach(facet => {
|
| 101 |
+
if (facet === 'repeat_usage') return; // Skip repeat_usage
|
| 102 |
+
|
| 103 |
+
let bestModel = null;
|
| 104 |
+
let highestScore = -Infinity;
|
| 105 |
+
let stdDev = 0;
|
| 106 |
+
|
| 107 |
+
modelsWithColors.forEach(model => {
|
| 108 |
+
// Check if the model has facet scores
|
| 109 |
+
if (model.facet_scores && model.facet_scores[facet] !== undefined) {
|
| 110 |
+
const score = model.facet_scores[facet];
|
| 111 |
+
if (score > highestScore) {
|
| 112 |
+
highestScore = score;
|
| 113 |
+
bestModel = model.model;
|
| 114 |
+
stdDev = model.facet_scores[`${facet}_std`] || 0;
|
| 115 |
+
}
|
| 116 |
+
}
|
| 117 |
+
});
|
| 118 |
+
|
| 119 |
+
if (bestModel) {
|
| 120 |
+
bestModelPerFacet[facet] = {
|
| 121 |
+
model: bestModel,
|
| 122 |
+
score: highestScore,
|
| 123 |
+
std: stdDev,
|
| 124 |
+
color: MODEL_COLORS[bestModel] || '#999999'
|
| 125 |
+
};
|
| 126 |
+
}
|
| 127 |
+
});
|
| 128 |
+
|
| 129 |
+
// Format task data for visualization
|
| 130 |
+
const taskData = rawData.taskData.map(task => {
|
| 131 |
+
// Find best model for this task
|
| 132 |
+
let bestModel = null;
|
| 133 |
+
let highestScore = -Infinity;
|
| 134 |
+
|
| 135 |
+
Object.entries(task).forEach(([key, value]) => {
|
| 136 |
+
if (modelsMap[key] && value !== null && value > highestScore) {
|
| 137 |
+
highestScore = value;
|
| 138 |
+
bestModel = key;
|
| 139 |
+
}
|
| 140 |
+
});
|
| 141 |
+
|
| 142 |
+
return {
|
| 143 |
+
...task,
|
| 144 |
+
bestModel,
|
| 145 |
+
bestModelColor: bestModel ? MODEL_COLORS[bestModel] : null,
|
| 146 |
+
bestScore: highestScore !== -Infinity ? highestScore : null
|
| 147 |
+
};
|
| 148 |
+
});
|
| 149 |
+
|
| 150 |
+
return {
|
| 151 |
+
models: modelsWithColors,
|
| 152 |
+
modelsMap,
|
| 153 |
+
taskData,
|
| 154 |
+
radarData: rawData.radarData,
|
| 155 |
+
taskCategories,
|
| 156 |
+
metricGroups,
|
| 157 |
+
facets,
|
| 158 |
+
bestModelPerCategory,
|
| 159 |
+
bestModelPerMetricGroup,
|
| 160 |
+
bestModelPerFacet,
|
| 161 |
+
// Pass through demographic data fields
|
| 162 |
+
demographicSummary: rawData.demographicSummary,
|
| 163 |
+
fairnessMetrics: rawData.fairnessMetrics,
|
| 164 |
+
demographicOptions: rawData.demographicOptions,
|
| 165 |
+
keyFacetsByTaskCategory: rawData.keyFacetsByTaskCategory,
|
| 166 |
+
keyAspectsByTask: rawData.keyAspectsByTask
|
| 167 |
+
};
|
| 168 |
+
}
|
| 169 |
+
|
| 170 |
+
/**
|
| 171 |
+
* Determine styling based on score
|
| 172 |
+
* @param {number} score - The score to evaluate
|
| 173 |
+
* @param {number} min - The minimum possible score (default: 0)
|
| 174 |
+
* @param {number} max - The maximum possible score (default: 5)
|
| 175 |
+
* @returns {string} - CSS class for the score badge
|
| 176 |
+
*/
|
| 177 |
+
export function getScoreBadgeColor(score, min = 0, max = 100) {
|
| 178 |
+
// For facet scores (-100 to +100)
|
| 179 |
+
if (min < 0) {
|
| 180 |
+
if (score >= 50) return 'bg-green-100 text-green-800';
|
| 181 |
+
if (score >= 0) return 'bg-blue-100 text-blue-800';
|
| 182 |
+
if (score >= -50) return 'bg-yellow-100 text-yellow-800';
|
| 183 |
+
return 'bg-red-100 text-red-800';
|
| 184 |
+
}
|
| 185 |
+
|
| 186 |
+
// For aspect scores (0 to 100)
|
| 187 |
+
const range = max - min;
|
| 188 |
+
const percent = ((score - min) / range) * 100;
|
| 189 |
+
|
| 190 |
+
if (percent >= 80) return 'bg-green-100 text-green-800';
|
| 191 |
+
if (percent >= 60) return 'bg-blue-100 text-blue-800';
|
| 192 |
+
if (percent >= 40) return 'bg-yellow-100 text-yellow-800';
|
| 193 |
+
return 'bg-red-100 text-red-800';
|
| 194 |
+
}
|
| 195 |
+
|
| 196 |
+
/**
|
| 197 |
+
* Format likert score for display (-3 to +3 scale)
|
| 198 |
+
* @param {number} score - The likert score
|
| 199 |
+
* @returns {string} - Formatted score string
|
| 200 |
+
*/
|
| 201 |
+
export function formatLikertScore(score) {
|
| 202 |
+
const formatted = score.toFixed(1);
|
| 203 |
+
if (score > 0) return `+${formatted}`;
|
| 204 |
+
return formatted;
|
| 205 |
+
}
|
leaderboard-app/next.config.mjs
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
/** @type {import('next').NextConfig} */
|
| 2 |
+
const nextConfig = {};
|
| 3 |
+
|
| 4 |
+
export default nextConfig;
|
leaderboard-app/package-lock.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
leaderboard-app/package.json
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"name": "leaderboard-app",
|
| 3 |
+
"version": "0.1.0",
|
| 4 |
+
"private": true,
|
| 5 |
+
"scripts": {
|
| 6 |
+
"dev": "next dev",
|
| 7 |
+
"build": "next build",
|
| 8 |
+
"start": "next start",
|
| 9 |
+
"lint": "next lint"
|
| 10 |
+
},
|
| 11 |
+
"dependencies": {
|
| 12 |
+
"next": "15.2.3",
|
| 13 |
+
"react": "^19.0.0",
|
| 14 |
+
"react-dom": "^19.0.0",
|
| 15 |
+
"recharts": "^2.15.1"
|
| 16 |
+
},
|
| 17 |
+
"devDependencies": {
|
| 18 |
+
"@eslint/eslintrc": "^3",
|
| 19 |
+
"@tailwindcss/postcss": "^4",
|
| 20 |
+
"eslint": "^9",
|
| 21 |
+
"eslint-config-next": "15.2.3",
|
| 22 |
+
"tailwindcss": "^4"
|
| 23 |
+
}
|
| 24 |
+
}
|
leaderboard-app/postcss.config.mjs
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
const config = {
|
| 2 |
+
plugins: ["@tailwindcss/postcss"],
|
| 3 |
+
};
|
| 4 |
+
|
| 5 |
+
export default config;
|
leaderboard-app/public/llm_comparison_data.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
leaderboard-app/public/vercel.svg
ADDED
|
|