view article Article Generative AI for Recommendation Systems: A Guide to Tokenizing User Interaction Data By jiagaoxiang • Mar 26 • 6
ARGenSeg: Image Segmentation with Autoregressive Image Generation Model Paper • 2510.20803 • Published 17 days ago • 9
Unified Reinforcement and Imitation Learning for Vision-Language Models Paper • 2510.19307 • Published 18 days ago • 26
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence Paper • 2509.12203 • Published Sep 15 • 19
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 237
Intern-S1: A Scientific Multimodal Foundation Model Paper • 2508.15763 • Published Aug 21 • 255
Running on Zero 174 174 Chat with Kimi-VL-A3B-Thinking-2506 🤔 Chat with images, videos, or PDFs to generate text
Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents Paper • 2508.05954 • Published Aug 8 • 6
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders Jul 9 • 705
view article Article Asynchronous Robot Inference: Decoupling Action Prediction and Execution Jul 10 • 44
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data Jun 3 • 277