---
title: README
emoji: 📚
colorFrom: pink
colorTo: yellow
sdk: static
pinned: true
license: apache-2.0
---
# Tbilisi AI Lab

**Open, Georgian-first Generative AI.**  
We are a **non-profit** on a mission to build capable, affordable, and open Georgian language models.

Georgian is a **low-resource** language with ~4M speakers. We believe Georgian speakers deserve their own **ChatGPT-like, productivity-boosting** AI — built *with* and *for* our community.

---

## 🔔 What’s new (Oct 2025)

We have **open-sourced all our models and datasets — from pretraining corpora to every stage of fine-tuning** (instruction/SFT, function-calling, and preference/DPO). Explore everything on our Hugging Face org:
- Org: https://huggingface.co/tbilisi-ai-lab

**Examples of what’s now public:**
- **Models**
  - 12B family: Base + Instruct  
    - https://huggingface.co/tbilisi-ai-lab/kona2-12B  
    - https://huggingface.co/tbilisi-ai-lab/kona2-12B-Base  
    - https://huggingface.co/tbilisi-ai-lab/kona2-12B-Instruct
  - **Small** 3.8B SLM  
    - https://huggingface.co/tbilisi-ai-lab/kona2-small-3.8B
- **Datasets (selection)**
  - **Instruction (SFT)** mix (2.61M pairs): https://huggingface.co/datasets/tbilisi-ai-lab/kona-sft-mix-2.6M  
  - **Function-calling SFT** (EN 115k / KA 93k):  
    - https://huggingface.co/datasets/tbilisi-ai-lab/kona-sft-function-calling-115k  
    - https://huggingface.co/datasets/tbilisi-ai-lab/kona-sft-function-calling-ka-93k
  - **Preference (DPO)** mix (387k): https://huggingface.co/datasets/tbilisi-ai-lab/kona-dpo-mix-387k
  - **Evaluation/Knowledge** sets in Georgian (e.g., SuperGLUE-KA, BoolQ-KA, CommonsenseQA-KA, code-instruct-KA, wiki-QA-KA, human-translated EN↔KA, etc.):  
    - Browse all: https://huggingface.co/tbilisi-ai-lab/datasets

---

## Why we exist

- **Language equity:** Great AI shouldn’t be limited to high-resource languages.  
- **Local impact:** Better Georgian NLP improves education, services, accessibility, and economic opportunity.  
- **Open science:** We share models, data, and recipes so others can reproduce and build on our work.

---

## Quickstart (12B)

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_id = "tbilisi-ai-lab/kona2-12B"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype="auto")

chat = pipeline("text-generation", model=model, tokenizer=tok)
prompt = "გთხოვ, შეაჯამე ეს ტექსტი მოკლედ (ქართული): ქართული ენა უნიკალურია..."
out = chat(prompt, max_new_tokens=256, do_sample=True, temperature=0.7)
print(out[0]["generated_text"])