-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arXiv:2503.16430
-
Learning Few-Step Diffusion Models by Trajectory Distribution Matching
Paper • 2503.06674 • Published • 8 -
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Paper • 2503.16430 • Published • 34 -
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
Paper • 2503.19904 • Published • 2 -
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation
Paper • 2504.14899 • Published • 20
-
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Paper • 2411.02337 • Published • 36 -
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Paper • 2411.04996 • Published • 51 -
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 68 -
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 47
-
Continuous Autoregressive Language Models
Paper • 2510.27688 • Published • 59 -
Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space
Paper • 2505.13181 • Published • 9 -
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Paper • 2503.19325 • Published • 73 -
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Paper • 2503.16430 • Published • 34
-
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer
Paper • 2501.18427 • Published • 23 -
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Paper • 2502.20388 • Published • 16 -
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
Paper • 2503.09641 • Published • 40 -
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Paper • 2503.16430 • Published • 34
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Continuous Autoregressive Language Models
Paper • 2510.27688 • Published • 59 -
Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space
Paper • 2505.13181 • Published • 9 -
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Paper • 2503.19325 • Published • 73 -
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Paper • 2503.16430 • Published • 34
-
Learning Few-Step Diffusion Models by Trajectory Distribution Matching
Paper • 2503.06674 • Published • 8 -
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Paper • 2503.16430 • Published • 34 -
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
Paper • 2503.19904 • Published • 2 -
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation
Paper • 2504.14899 • Published • 20
-
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer
Paper • 2501.18427 • Published • 23 -
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Paper • 2502.20388 • Published • 16 -
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
Paper • 2503.09641 • Published • 40 -
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Paper • 2503.16430 • Published • 34
-
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Paper • 2411.02337 • Published • 36 -
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Paper • 2411.04996 • Published • 51 -
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 68 -
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 47