nilbot (Ersi Ni)

upvoted a paper 9 months ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 165

upvoted an article 10 months ago

Article

G2P Shrinks Speech Models

Feb 5

•

82

upvoted a paper 10 months ago

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 426

upvoted an article 10 months ago

Article

MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era

Jan 15

•

48

upvoted an article 11 months ago

Article

🌁#82: AI and ML in Real Life

Jan 7

•

16

upvoted a paper 12 months ago

UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages

Paper • 2411.14343 • Published Nov 21, 2024 • 7

upvoted an article about 1 year ago

Article

GaLore: Advancing Large Model Training on Consumer-grade Hardware

Mar 20, 2024

•

32

upvoted a collection about 1 year ago

Qwen2-VL

Collection

Vision-language model series based on Qwen2 • 16 items • Updated Jul 21 • 226

upvoted a paper over 1 year ago

PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training

Paper • 2309.10400 • Published Sep 19, 2023 • 26

upvoted an article over 1 year ago

Article

Fine-tune Llama 3 with ORPO

Apr 22, 2024

•

241

upvoted a collection over 1 year ago

MGM-Data

Collection

Official data collection for the paper "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" • 2 items • Updated Apr 21, 2024 • 7

upvoted a paper almost 2 years ago

Grandmaster-Level Chess Without Search

Paper • 2402.04494 • Published Feb 7, 2024 • 69

Ersi Ni

AI & ML interests

Organizations

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

G2P Shrinks Speech Models

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era

🌁#82: AI and ML in Real Life

UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages

GaLore: Advancing Large Model Training on Consumer-grade Hardware

Qwen2-VL

PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training

Fine-tune Llama 3 with ORPO

MGM-Data

Grandmaster-Level Chess Without Search

Ersi Ni

AI & ML interests

Organizations

nilbot's activity

G2P Shrinks Speech Models

MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era

🌁#82: AI and ML in Real Life

GaLore: Advancing Large Model Training on Consumer-grade Hardware

Fine-tune Llama 3 with ORPO