🧠 AceNemotron-14B 8-Bit Quantized Model
Welcome to the 8-bit quantized version of the AceNemotron-14B model! This release brings reduced memory usage and faster inference speeds, making it suitable for deployment on lower-resource hardware without sacrificing much performance.
📦 Model Details
- Base Model: AceNemotron-14B
- Quantization: 8-bit
- Quantized With: BitsAndBytes
- Precision: Int8
- Use Case: Faster inference, lower memory footprint, efficient finetuning
- Uploader: mr-abhisharma
🛠 Installation
To use this model efficiently, make sure you have the required dependencies installed:
pip install transformers accelerate bitsandbytes
🚀 Usage
Here's a basic example using transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "mr-abhisharma/AceNemotron-14B-8bit"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
load_in_8bit=True,
device_map="auto"
)
inputs = tokenizer("Once upon a time,", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
🧠 Why Quantized?
Quantization to 8-bit reduces model size drastically and improves inference speed, especially on consumer-grade GPUs (e.g. 16GB or lower). It's great for:
- Personal use and experimentation
- Running on laptops or single-GPU setups
- Cost-effective inference deployment
🔧 Limitations
- Slight drop in precision may occur.
- This model inherits the limitations of the base AceNemotron-14B model.
- Use responsibly—ensure compliance with licensing and intended use.
🤝 Acknowledgements
- Base model: AceNemotron Project
- Quantization tools: BitsAndBytes
- Inspired by the Hugging Face community ❤️
- Downloads last month
- -