HectorHe/DeepSeek-V2-Lite-aux-free-sft-commonsense-1epoch-1e-5-gamma-share-expert Text Generation • 16B • Updated Sep 25 • 1
HectorHe/Qwen1.5-MOE-sft-coommonsense15k-aux-free-1e-5-share-expert Text Generation • 14B • Updated Sep 25 • 3
HectorHe/Qwen1.5-MOE-sft-coommonsense15k-aux-free-3e-5-share-expert Text Generation • 14B • Updated Sep 24 • 1
HectorHe/OLMoE-1B-7B-0125-aux-free-sft-commonsense15k-3e-5 Text Generation • 7B • Updated Sep 24 • 11
HectorHe/DeepSeek-V2-Lite-aux-free-sft-commonsense-1epoch-1e-4-gamma-share-expert Text Generation • 16B • Updated Sep 24 • 11 • 1
HectorHe/OLMoE-1B-7B-0125-aux-free-sft-commonsense15k-share-expert Text Generation • 7B • Updated Sep 24 • 14 • 1
HectorHe/DeepSeek-V2-Lite-aux-free-sft-commonsense-1epoch-1e-4-gamma Text Generation • 16B • Updated Sep 24 • 9
HectorHe/Qwen1.5-MOE-sft-coommonsense15k-aux-free-share-experts Text Generation • 14B • Updated Sep 23 • 11 • 1
HectorHe/DeepSeek-V2-Lite-aux-free-sft-math7k-1epoch-1e-4-gamma-share-experts-2nd-epoch-high-bias-expert Text Generation • 16B • Updated Sep 21 • 10
HectorHe/DeepSeek-V2-Lite-aux-free-sft-math7k-1epoch-1e-4-gamma-share-experts-2nd-epoch-lr-1e-6 Text Generation • 126k • Updated Sep 19 • 14
HectorHe/DeepSeek-V2-Lite-aux-free-sft-math7k-1epoch-1e-4-gamma-share-experts-2nd-epoch-lr-5e-6 Text Generation • 16B • Updated Sep 19 • 13