X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model Paper • 2510.10274 • Published about 1 month ago • 13
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment Paper • 2406.19736 • Published Jun 28, 2024 • 3