Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arXiv:2510.21583

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published 27 days ago • 173
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published 16 days ago • 30
Sparser Block-Sparse Attention via Token Permutation

Paper • 2510.21270 • Published 16 days ago • 23

RL makes MLLMs see better than SFT

Paper • 2510.16333 • Published 22 days ago • 47
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback

Paper • 2510.16888 • Published 21 days ago • 18
Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published 24 days ago • 45
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published 16 days ago • 30

[RL] Text-to-Image

Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published 16 days ago • 30

Group Relative Attention Guidance for Image Editing

Paper • 2510.24657 • Published 12 days ago • 23
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published 16 days ago • 30
F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

Paper • 2407.12435 • Published Jul 17, 2024 • 14
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

Paper • 2401.09340 • Published Jan 17, 2024 • 21

HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video

Paper • 2510.05560 • Published Oct 7 • 7
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning

Paper • 2510.06217 • Published Oct 7 • 62
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6 • 468
Fast-dLLM v2: Efficient Block-Diffusion LLM

Paper • 2509.26328 • Published Sep 30 • 51

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

[RL] Text-to-Image

Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published 16 days ago • 30

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published 27 days ago • 173
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published 16 days ago • 30
Sparser Block-Sparse Attention via Token Permutation

Paper • 2510.21270 • Published 16 days ago • 23

Group Relative Attention Guidance for Image Editing

Paper • 2510.24657 • Published 12 days ago • 23
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published 16 days ago • 30
F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

Paper • 2407.12435 • Published Jul 17, 2024 • 14
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

Paper • 2401.09340 • Published Jan 17, 2024 • 21

RL makes MLLMs see better than SFT

Paper • 2510.16333 • Published 22 days ago • 47
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback

Paper • 2510.16888 • Published 21 days ago • 18
Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published 24 days ago • 45
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published 16 days ago • 30

HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video

Paper • 2510.05560 • Published Oct 7 • 7
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning

Paper • 2510.06217 • Published Oct 7 • 62
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6 • 468
Fast-dLLM v2: Efficient Block-Diffusion LLM

Paper • 2509.26328 • Published Sep 30 • 51

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs