-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arXiv:2510.21583
-
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Paper • 2510.11696 • Published • 173 -
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation
Paper • 2510.21583 • Published • 30 -
Sparser Block-Sparse Attention via Token Permutation
Paper • 2510.21270 • Published • 23
-
RL makes MLLMs see better than SFT
Paper • 2510.16333 • Published • 47 -
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback
Paper • 2510.16888 • Published • 18 -
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper • 2510.14901 • Published • 45 -
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation
Paper • 2510.21583 • Published • 30
-
Group Relative Attention Guidance for Image Editing
Paper • 2510.24657 • Published • 23 -
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation
Paper • 2510.21583 • Published • 30 -
F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
Paper • 2407.12435 • Published • 14 -
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
Paper • 2401.09340 • Published • 21
-
HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video
Paper • 2510.05560 • Published • 7 -
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
Paper • 2510.06217 • Published • 62 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 468 -
Fast-dLLM v2: Efficient Block-Diffusion LLM
Paper • 2509.26328 • Published • 51
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Paper • 2510.11696 • Published • 173 -
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation
Paper • 2510.21583 • Published • 30 -
Sparser Block-Sparse Attention via Token Permutation
Paper • 2510.21270 • Published • 23
-
Group Relative Attention Guidance for Image Editing
Paper • 2510.24657 • Published • 23 -
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation
Paper • 2510.21583 • Published • 30 -
F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
Paper • 2407.12435 • Published • 14 -
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
Paper • 2401.09340 • Published • 21
-
RL makes MLLMs see better than SFT
Paper • 2510.16333 • Published • 47 -
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback
Paper • 2510.16888 • Published • 18 -
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper • 2510.14901 • Published • 45 -
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation
Paper • 2510.21583 • Published • 30
-
HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video
Paper • 2510.05560 • Published • 7 -
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
Paper • 2510.06217 • Published • 62 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 468 -
Fast-dLLM v2: Efficient Block-Diffusion LLM
Paper • 2509.26328 • Published • 51