7 13 2

Jiang

Dongwei

Some-random

AI & ML interests

None yet

Recent Activity

upvoted a paper about 1 month ago

The Alignment Waltz: Jointly Training Agents to Collaborate for Safety

upvoted a paper about 2 months ago

IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning

upvoted an article about 2 months ago

SmolLM3: smol, multilingual, long-context reasoner

View all activity

Organizations

upvoted a paper about 1 month ago

The Alignment Waltz: Jointly Training Agents to Collaborate for Safety

Paper • 2510.08240 • Published Oct 9 • 40

upvoted a paper about 2 months ago

IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning

Paper • 2509.22621 • Published Sep 26 • 8

upvoted an article about 2 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

Jul 8

•

725

upvoted an article 3 months ago

Article

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

Feb 11

•

upvoted a paper 3 months ago

Jointly Reinforcing Diversity and Quality in Language Model Generations

Paper • 2509.02534 • Published Sep 2 • 24

authored a paper 3 months ago

Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback

Paper • 2506.11930 • Published Jun 13 • 53

liked a dataset 3 months ago

Dongwei/Feedback_Friction_Dataset

Viewer • Updated Jun 17 • 394 • 56 • 2

New activity in Dongwei/Feedback_Friction_Dataset 5 months ago

Add link to Github repository

#2 opened 5 months ago by

nielsr

upvoted a paper 5 months ago

Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback

Paper • 2506.11930 • Published Jun 13 • 53

commented a paper 5 months ago

Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback

Paper • 2506.11930 • Published Jun 13 • 53 •

updated a dataset 5 months ago

Dongwei/Feedback_Friction_Dataset

Viewer • Updated Jun 17 • 394 • 56 • 2

published a dataset 5 months ago

Dongwei/Feedback_Friction_Dataset

Viewer • Updated Jun 17 • 394 • 56 • 2

upvoted a paper 8 months ago

Optimizing Decomposition for Optimal Claim Verification

Paper • 2503.15354 • Published Mar 19 • 18

liked a model 9 months ago

Dongwei/Qwen-2.5-7B_Base_Math_smalllr

Text Generation • 8B • Updated Feb 5 • 1 • 6

updated a model 9 months ago

Dongwei/Qwen-2.5-7B_Base_Math_smalllr_newdata

Text Generation • 8B • Updated Feb 13

published a model 9 months ago

Dongwei/Qwen-2.5-7B_Base_Math_smalllr_newdata

Text Generation • 8B • Updated Feb 13

New activity in Dongwei/Qwen-2.5-7B_Base_Math_smalllr 9 months ago

Wandb log not found

#1 opened 9 months ago by

Yuu1998

updated 2 models 9 months ago

Dongwei/Qwen-2.5-7B_Base_Math_smalllr_longer

Text Generation • 8B • Updated Feb 11 • 3

Dongwei/Qwen-2.5-7B_Base_Math_smallestlr

Text Generation • 8B • Updated Feb 11 • 2

published a model 9 months ago

Dongwei/Qwen-2.5-7B_Base_Math_smalllr_longer

Text Generation • 8B • Updated Feb 11 • 3

Jiang

AI & ML interests

Recent Activity

Organizations

Dongwei's activity

SmolLM3: smol, multilingual, long-context reasoner

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

Add link to Github repository

Wandb log not found