Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arXiv:2503.16430

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Learning Few-Step Diffusion Models by Trajectory Distribution Matching

Paper • 2503.06674 • Published Mar 9 • 8
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

Paper • 2503.16430 • Published Mar 20 • 34
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better

Paper • 2503.19904 • Published Mar 25 • 2
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation

Paper • 2504.14899 • Published Apr 21 • 20

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Paper • 2411.02337 • Published Nov 4, 2024 • 36
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Paper • 2411.04996 • Published Nov 7, 2024 • 51
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 68
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization

Paper • 2410.08815 • Published Oct 11, 2024 • 47

Tokenization methods and language modelling

Continuous Autoregressive Language Models

Paper • 2510.27688 • Published 9 days ago • 59
Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space

Paper • 2505.13181 • Published May 19 • 9
Long-Context Autoregressive Video Modeling with Next-Frame Prediction

Paper • 2503.19325 • Published Mar 25 • 73
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

Paper • 2503.16430 • Published Mar 20 • 34

SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer

Paper • 2501.18427 • Published Jan 30 • 23
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation

Paper • 2502.20388 • Published Feb 27 • 16
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

Paper • 2503.09641 • Published Mar 12 • 40
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

Paper • 2503.16430 • Published Mar 20 • 34

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Tokenization methods and language modelling

Continuous Autoregressive Language Models

Paper • 2510.27688 • Published 9 days ago • 59
Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space

Paper • 2505.13181 • Published May 19 • 9
Long-Context Autoregressive Video Modeling with Next-Frame Prediction

Paper • 2503.19325 • Published Mar 25 • 73
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

Paper • 2503.16430 • Published Mar 20 • 34

Learning Few-Step Diffusion Models by Trajectory Distribution Matching

Paper • 2503.06674 • Published Mar 9 • 8
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

Paper • 2503.16430 • Published Mar 20 • 34
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better

Paper • 2503.19904 • Published Mar 25 • 2
Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation

Paper • 2504.14899 • Published Apr 21 • 20

SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer

Paper • 2501.18427 • Published Jan 30 • 23
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation

Paper • 2502.20388 • Published Feb 27 • 16
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

Paper • 2503.09641 • Published Mar 12 • 40
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

Paper • 2503.16430 • Published Mar 20 • 34

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Paper • 2411.02337 • Published Nov 4, 2024 • 36
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Paper • 2411.04996 • Published Nov 7, 2024 • 51
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 68
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization

Paper • 2410.08815 • Published Oct 11, 2024 • 47

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs