FlagEval

non-profit

https://flageval.baai.ac.cn/

AI & ML interests

None defined yet.

Recent Activity

philokey updated a dataset 4 days ago

FlagEval/coco_val2014_sampled

philokey authored a paper 7 days ago

Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench

philokey updated a dataset 8 days ago

FlagEval/MeasureBench

View all activity

spaces 2

FlagEval-Arena

Arena

FlagEval-Debate

Display a debate interface

models 1

FlagEval/flageval_judgemodel

Text Generation • 33B • Updated Dec 30, 2024 • 1 • 1

datasets 12

FlagEval/coco_val2014_sampled

Viewer • Updated 4 days ago • 1k • 39

FlagEval/MeasureBench

Viewer • Updated 8 days ago • 2.44k • 117

FlagEval/EmbodiedVerse-Bench

Viewer • Updated Jun 25 • 2.04k • 215

FlagEval/Where2Place

Viewer • Updated May 29 • 100 • 208

FlagEval/SAT

Viewer • Updated May 6 • 150 • 49

FlagEval/HMMT_2025

Viewer • Updated May 6 • 30 • 46

FlagEval/ERQA

Viewer • Updated Apr 22 • 400 • 364 • 2

FlagEval/sub_spatial

Viewer • Updated Apr 21 • 690 • 71

FlagEval/EmbSpatial-Bench

Viewer • Updated Apr 21 • 3.64k • 170 • 2

FlagEval/documentation-images

Viewer • Updated Nov 13, 2024 • 3 • 201

View 12 datasets