Submitted by XuejingLiu 20 Revisiting Multimodal Positional Encoding in Vision-Language Models Qwen 2