-
Parallel Scaling Law for Language Models
Paper • 2505.10475 • Published • 83 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 54 -
Scaling Diffusion Transformers Efficiently via μP
Paper • 2505.15270 • Published • 35 -
Vision Transformers Don't Need Trained Registers
Paper • 2506.08010 • Published • 21
Xiaofan Zhu
Augusteinia
AI & ML interests
VLM, RL, Robotics
Organizations
None yet
Math
-
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
Paper • 2505.10557 • Published • 47 -
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
Paper • 2505.16400 • Published • 35 -
PhyX: Does Your Model Have the "Wits" for Physical Reasoning?
Paper • 2505.15929 • Published • 49 -
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
Paper • 2506.05349 • Published • 24
Paradigm
-
Parallel Scaling Law for Language Models
Paper • 2505.10475 • Published • 83 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 54 -
Scaling Diffusion Transformers Efficiently via μP
Paper • 2505.15270 • Published • 35 -
Vision Transformers Don't Need Trained Registers
Paper • 2506.08010 • Published • 21
Math
-
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
Paper • 2505.10557 • Published • 47 -
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
Paper • 2505.16400 • Published • 35 -
PhyX: Does Your Model Have the "Wits" for Physical Reasoning?
Paper • 2505.15929 • Published • 49 -
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
Paper • 2506.05349 • Published • 24
models
0
None public yet
datasets
0
None public yet