The case for 4-bit precision: k-bit Inference Scaling Laws Paper • 2212.09720 • Published Dec 19, 2022 • 3
Distributed Inference and Fine-tuning of Large Language Models Over The Internet Paper • 2312.08361 • Published Dec 13, 2023 • 28
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale Paper • 2208.07339 • Published Aug 15, 2022 • 5