18 69 130

archit PRO

archit11

archit-spec

AI & ML interests

small language models

Recent Activity

updated a model 5 days ago

archit11/joey-rank1

published a model 5 days ago

archit11/joey-rank1

published a model 5 days ago

archit11/hyperswitch-new

View all activity

Organizations

upvoted an article 11 days ago

Article

Exploring Direct Tensor Manipulation in Language Models: A Case Study in Binary-Level Model Enhancement

17 days ago

•

upvoted 2 articles 3 months ago

Article

From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels

Aug 18

•

Article

How to Run a Hugging Face Model in JAX (Part 1)

Jul 20

•

upvoted a paper 4 months ago

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 156

upvoted 6 articles 5 months ago

Article

You could have designed state of the art positional encoding

Nov 25, 2024

•

398

Article

Understanding Gemma 3n: How MatFormer Gives You Many Models in One

Jun 26

•

Article

G2P Shrinks Speech Models

Feb 5

•

Article

State of open video generation models in Diffusers

Jan 27

•

Article

How Long Prompts Block Other Requests - Optimizing LLM Performance

Jun 12

•

Article

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

Apr 16

•

upvoted 2 papers 6 months ago

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2 • 185

upvoted an article 8 months ago

Article

Enabling Long Context Training with Sequence Parallelism in Axolotl

Apr 4

•

upvoted an article 9 months ago

Article

SigLIP 2: A better multilingual vision language encoder

Feb 21

•

192

upvoted an article 10 months ago

Article

The case for specialized pre-training: ultra-fast foundation models for dedicated tasks

Aug 4, 2024

•

upvoted 3 collections 10 months ago

upvoted 2 articles 10 months ago

Article

How to deploy and fine-tune DeepSeek models on AWS

Jan 30

•

Article

Can we create pedagogically valuable multi-turn synthetic datasets from Cosmopedia?

May 7, 2024

•

archit PRO

AI & ML interests

Recent Activity

Organizations

archit11's activity

Exploring Direct Tensor Manipulation in Language Models: A Case Study in Binary-Level Model Enhancement

From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels

How to Run a Hugging Face Model in JAX (Part 1)

You could have designed state of the art positional encoding

Understanding Gemma 3n: How MatFormer Gives You Many Models in One

G2P Shrinks Speech Models

State of open video generation models in Diffusers

How Long Prompts Block Other Requests - Optimizing LLM Performance

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

Enabling Long Context Training with Sequence Parallelism in Axolotl

SigLIP 2: A better multilingual vision language encoder

The case for specialized pre-training: ultra-fast foundation models for dedicated tasks

How to deploy and fine-tune DeepSeek models on AWS

Can we create pedagogically valuable multi-turn synthetic datasets from Cosmopedia?