X's picture

2

X

Phoebe13

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 2 months ago

Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards

authored a paper 2 months ago

Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding

upvoted a paper 3 months ago

Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding

View all activity

Organizations

None yet

upvoted a paper about 2 months ago

Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards

Paper • 2509.24981 • Published Sep 29 • 29

authored a paper 2 months ago

Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding

Paper • 2508.20478 • Published Aug 28 • 17

upvoted a paper 3 months ago

Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding

Paper • 2508.20478 • Published Aug 28 • 17

updated a model 3 months ago

Phoebe13/Video-MTR

Visual Question Answering • 8B • Updated Sep 3 • 11 • 7

published a model 3 months ago

Phoebe13/Video-MTR

Visual Question Answering • 8B • Updated Sep 3 • 11 • 7

published a model 7 months ago

Phoebe13/Qwen-2.5-7B-Instruct_Explore0.5_30k_stage234_v1.2_ev_handcomp_simple_with_handtype

published 11 models 8 months ago

Phoebe13/Qwen-2.5-7B-Instruct_Explore0.5_30k_stage234_v1.2_ev_handcomp_simple

Phoebe13/Qwen-2.5-7B-Instruct_Explore0.5_30k_stage234_ev_handcomp_simple

Phoebe13/Qwen-2.5-7B-Instruct_Explore0.25_12k_stage234_ev_handcomp_simple

Phoebe13/Qwen-2.5-7B-Instruct-Poker-30k_stage234_ev-by-handcomp-simple

Phoebe13/Qwen-2.5-7B-Instruct-Poker-30k_stage1234_ev-by-handcomp-simple

Phoebe13/Qwen-2.5-7B-Instruct-Poker-16k_stage234_ev-by-handcomp-simple

Phoebe13/Qwen-2.5-7B-Instruct-Poker-ev-by-handcomp-simple

Phoebe13/Qwen-2.5-7B-Poker-RL-StrictFormat-ev-by-handcomp-simple

Phoebe13/Qwen-2.5-7B-Poker-RL-StrictFormat-ev-by-handcomp

Phoebe13/Qwen-2.5-7B-Poker-RL-StrictFormat

Phoebe13/Qwen-2.5-7B-Poker-RL

published 2 models 9 months ago

Phoebe13/DeepSeek-R1-Distill-Qwen-1.5B-GRPO

Phoebe13/Qwen-2.5-7B-Simple-RL