Yi Cui's picture

Yi Cui

onekq

·

https://onekq.ai

AI & ML interests

Benchmark, Code Generation Model

Recent Activity

updated a Space about 11 hours ago

onekq-ai/WebApp1K-models-leaderboard

posted an update about 20 hours ago

If RAG (by that I meant vectors and embeddings) transitions from QA to agents, is scalability (from wikipedia to personal memory) still an issue? What will be the new challenges? Anyone care to share experience?

posted an update 2 days ago

No SOTA from gpt5 codex https://huggingface.co/spaces/onekq-ai/WebApp1K-models-leaderboard

View all activity

Organizations

authored a paper 6 months ago

Tests as Prompt: A Test-Driven-Development Benchmark for LLM Code Generation

Paper • 2505.09027 • Published May 13

authored 3 papers about 1 year ago

A Case Study of Web App Coding with OpenAI Reasoning Models

Paper • 2409.13773 • Published Sep 19, 2024 • 7

WebApp1K: A Practical Code-Generation Benchmark for Web App Development

Paper • 2408.00019 • Published Jul 30, 2024 • 2

Insights from Benchmarking Frontier Language Models on Web App Code Generation

Paper • 2409.05177 • Published Sep 8, 2024 • 8