Artificial Analysis

company

https://artificialanalysis.ai/

ArtificialAnlys

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

declanjackson new activity about 9 hours ago

ArtificialAnalysis/AA-Omniscience-Public:update_question_id

declanjackson updated a dataset 1 day ago

ArtificialAnalysis/AA-Omniscience-Public

declanjackson new activity 1 day ago

ArtificialAnalysis/AA-Omniscience-Public:Update README.md

View all activity

Articles

declanjackson

in ArtificialAnalysis/AA-Omniscience-Public about 9 hours ago

update_question_id

#4 opened about 9 hours ago by

declanjackson

updated a dataset 1 day ago

ArtificialAnalysis/AA-Omniscience-Public

Viewer • Updated 1 day ago • 600 • 427 • 10

declanjackson

in ArtificialAnalysis/AA-Omniscience-Public 1 day ago

Update README.md

#1 opened 1 day ago by

declanjackson

published a dataset 1 day ago

ArtificialAnalysis/AA-Omniscience-Public

Viewer • Updated 1 day ago • 600 • 427 • 10

kartikey-aa

in ArtificialAnalysis/AA-LCR 20 days ago

What is passed to the LLM Equality Checker

#9 opened 21 days ago by

sophies-cb

kartikey-aa

in ArtificialAnalysis/AA-LCR 2 months ago

Processing the dataset

👍 6

#3 opened 3 months ago by

stoshniwal

Unable to download all pdfs from the provided data_source_urls

🔥 9

#4 opened 3 months ago by

ronanki123

add-extracted-text

#6 opened 2 months ago by

kartikey-aa

updated a dataset 2 months ago

ArtificialAnalysis/AA-LCR

Viewer • Updated Sep 11 • 100 • 895 • 15

kartikey-aa

in ArtificialAnalysis/AA-LCR 2 months ago

feat/extracted-text

#7 opened 2 months ago by

kartikey-aa

georgewritescode

in ArtificialAnalysis/AA-LCR 3 months ago

Upload AA-LCR_Dataset.csv

#5 opened 3 months ago by

georgewritescode

posted an update 3 months ago

Post

3018

Announcing Artificial Analysis Long Context Reasoning (AA-LCR), a new benchmark to evaluate long context performance through testing reasoning capabilities across multiple long documents (~100k tokens)

The focus of AA-LCR is to replicate real knowledge work and reasoning tasks, testing capability critical to modern AI applications spanning document analysis, codebase understanding, and complex multi-step workflows.

AA-LCR is 100 hard text-based questions that require reasoning across multiple real-world documents that represent ~100k input tokens. Questions are designed so answers cannot be directly found but must be reasoned from multiple information sources, with human testing verifying that each question requires genuine inference rather than retrieval.

Key takeaways:
➤ Today’s leading models achieve ~70% accuracy: the top three places go to OpenAI o3 (69%), xAI Grok 4 (68%) and Qwen3 235B 2507 Thinking (67%)

➤👀 We also already have gpt-oss results! 120B performs close to o4-mini (high), in-line with OpenAI claims regarding model performance. We will be following up shortly with a Intelligence Index for the models.

➤ 100 hard text-based questions spanning 7 categories of documents (Company Reports, Industry Reports, Government Consultations, Academia, Legal, Marketing Materials and Survey Reports)

➤ ~100k tokens of input per question, requiring models to support a minimum 128K context window to score on this benchmark

➤ ~3M total unique input tokens spanning ~230 documents to run the benchmark (output tokens typically vary by model)

We’re adding AA-LCR to the Artificial Analysis Intelligence Index, and taking the version number to v2.2. Artificial Analysis Intelligence Index v2.2 now includes: MMLU-Pro, GPQA Diamond, AIME 2025, IFBench, LiveCodeBench, SciCode and AA-LCR.

Link to dataset: ArtificialAnalysis/AA-LCR

mhillsmith

updated a dataset 3 months ago

ArtificialAnalysis/AA-LCR

Viewer • Updated Sep 11 • 100 • 895 • 15

georgewritescode

updated a dataset 3 months ago

ArtificialAnalysis/AA-LCR

Viewer • Updated Sep 11 • 100 • 895 • 15

georgewritescode

published a dataset 4 months ago

ArtificialAnalysis/AA-LCR

Viewer • Updated Sep 11 • 100 • 895 • 15

georgewritescode

in ArtificialAnalysis/AA-LCR 4 months ago

Upload AA-LCR_Dataset.csv

#1 opened 4 months ago by

georgewritescode

updated a Space 4 months ago

Music Arena Leaderboard

🎵

AI Music Arena & Leaderboard (Suno, Udio, Google, Meta, +)

georgewritescode

posted an update 4 months ago

Post

297

Announcing the Artificial Analysis Music Arena Leaderboard: with >5k votes, Suno v4.5 is the leading Music Generation model followed by Riffusion’s FUZZ-1.1 Pro.

Google’s Lyria 2 places third in our Instrumental leaderboard, and Udio’s v1.5 Allegro places third in our Vocals leaderboard.

The Instrumental Leaderboard is as follows:
🥇 Suno V4.5
🥈 Riffusion's FUZZ-1.1 Pro
🥉 Google's Lyria 2
- Udio v1.5 Allegro
- StabilityAI's Stable Audio 2.0
- Meta's MusicGen

Rankings are based on community votes across a diverse range of genres and prompts.

Participate in the arena and check out the space here:
ArtificialAnalysis/Music-Arena-Leaderboard

georgewritescode

posted an update 4 months ago

Post

1065

🎵 Announcing Artificial Analysis Music Arena! Vote for songs generated by leading music models across genres from pop to metal to rock & more

Key details:
🏁 Participate in Music Arena and after a day of voting we’ll unveil the world’s first public ranking of AI music models.

✨ Currently featuring models from Suno, Riffusion, Meta, Google, udio and Stability AI!

🎤 Support for both a vocals mode and an instrumental mode

🎸 A diverse array of prompts from genres including pop, RnB, metal, rock, classical, jazz, and more

Check it out here:
ArtificialAnalysis/Music-Arena-Leaderboard

georgewritescode

published a Space 4 months ago

Music Arena Leaderboard

🎵

AI Music Arena & Leaderboard (Suno, Udio, Google, Meta, +)

AI & ML interests

Recent Activity

Articles

Evaluating Audio Reasoning with Big Bench Audio

Launching the Artificial Analysis Text to Image Leaderboard & Arena

Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face

Team members 7