Ai2

Team

non-profit

Verified

https://allenai.org/

allen_ai

allenai

AI & ML interests

Building breatkthrough AI to solve the world's biggest problems.

Recent Activity

yanhong-l authored a paper 17 days ago

Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMs

valentinhofmann authored a paper about 2 months ago

Large Language Models Discriminate Against Speakers of German Dialects

jaslee20 authored a paper 3 months ago

MolmoAct: Action Reasoning Models that can Reason in Space

View all activity

Papers

olmOCR 2: Unit Test Rewards for Document OCR

MolmoAct: Action Reasoning Models that can Reason in Space

View all Papers

Articles

Introducing the Open Chain of Thought Leaderboard

yanhong-l

authored a paper 17 days ago

Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMs

Paper • 2510.18279 • Published 19 days ago • 4

davidheineman

authored 3 papers 22 days ago

Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation

Paper • 2508.13144 • Published Aug 18

Evaluating LLMs on Chinese Idiom Translation

Paper • 2508.10421 • Published Aug 14

Fluid Language Model Benchmarking

Paper • 2509.11106 • Published Sep 14

stellalisy

authored 10 papers about 1 month ago

CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge

Paper • 2404.06664 • Published Apr 10, 2024 • 1

CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs

Paper • 2410.02677 • Published Oct 3, 2024 • 1

Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning

Paper • 2502.14860 • Published Feb 20

BLAB: Brutally Long Audio Bench

Paper • 2505.03054 • Published May 5 • 1

Spurious Rewards: Rethinking Training Signals in RLVR

Paper • 2506.10947 • Published Jun 12 • 1

MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning

Paper • 2406.00922 • Published Jun 3, 2024

PrefPalette: Personalized Preference Modeling with Latent Attributes

Paper • 2507.13541 • Published Jul 17 • 8

Medical Hallucinations in Foundation Models and Their Impact on Healthcare

Paper • 2503.05777 • Published Feb 26

Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning

Paper • 2305.19759 • Published May 31, 2023

Personalized Reasoning: Just-In-Time Personalization and Why LLMs Fail At It

Paper • 2510.00177 • Published Sep 30 • 3

lihaoxin2020

authored a paper 2 months ago

Demystifying Scientific Problem-Solving in LLMs by Probing Knowledge and Reasoning

Paper • 2508.19202 • Published Aug 26 • 7

hqfang

authored a paper 3 months ago

MolmoAct: Action Reasoning Models that can Reason in Space

Paper • 2508.07917 • Published Aug 11 • 44

ashish333

authored 4 papers 4 months ago

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Paper • 1803.05457 • Published Mar 14, 2018 • 2

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering

Paper • 1809.02789 • Published Sep 8, 2018

From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project

Paper • 1909.01958 • Published Sep 4, 2019

Probing Natural Language Inference Models through Semantic Fragments

Paper • 1909.07521 • Published Sep 16, 2019