-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arXiv:2412.01339
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 77
-
Negative Token Merging: Image-based Adversarial Feature Guidance
Paper • 2412.01339 • Published • 23 -
Efficient Distillation of Classifier-Free Guidance using Adapters
Paper • 2503.07274 • Published • 4 -
Flow-GRPO: Training Flow Matching Models via Online RL
Paper • 2505.05470 • Published • 85 -
Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models
Paper • 2510.11057 • Published • 30
-
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models
Paper • 2406.09416 • Published • 29 -
Wavelets Are All You Need for Autoregressive Image Generation
Paper • 2406.19997 • Published • 31 -
ViPer: Visual Personalization of Generative Models via Individual Preference Learning
Paper • 2407.17365 • Published • 13 -
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
Paper • 2408.11001 • Published • 13
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Negative Token Merging: Image-based Adversarial Feature Guidance
Paper • 2412.01339 • Published • 23 -
Efficient Distillation of Classifier-Free Guidance using Adapters
Paper • 2503.07274 • Published • 4 -
Flow-GRPO: Training Flow Matching Models via Online RL
Paper • 2505.05470 • Published • 85 -
Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models
Paper • 2510.11057 • Published • 30
-
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models
Paper • 2406.09416 • Published • 29 -
Wavelets Are All You Need for Autoregressive Image Generation
Paper • 2406.19997 • Published • 31 -
ViPer: Visual Personalization of Generative Models via Individual Preference Learning
Paper • 2407.17365 • Published • 13 -
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
Paper • 2408.11001 • Published • 13
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 18 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 62 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 77