Spaces:

tbilisi-ai-lab
/

README

Running

App Files Files Community

README / README.md

lukab

Update README.md

e42376e verified 17 days ago

preview code

raw

history blame contribute delete

2.77 kB

	---
	title: README
	emoji: 📚
	colorFrom: pink
	colorTo: yellow
	sdk: static
	pinned: true
	license: apache-2.0
	---
	# Tbilisi AI Lab

	Open, Georgian-first Generative AI.
	We are a non-profit on a mission to build capable, affordable, and open Georgian language models.

	Georgian is a low-resource language with ~4M speakers. We believe Georgian speakers deserve their own ChatGPT-like, productivity-boosting AI — built with and for our community.

	---

	## 🔔 What’s new (Oct 2025)

	We have open-sourced all our models and datasets — from pretraining corpora to every stage of fine-tuning (instruction/SFT, function-calling, and preference/DPO). Explore everything on our Hugging Face org:
	- Org: https://huggingface.co/tbilisi-ai-lab

	Examples of what’s now public:
	- Models
	- 12B family: Base + Instruct
	- https://huggingface.co/tbilisi-ai-lab/kona2-12B
	- https://huggingface.co/tbilisi-ai-lab/kona2-12B-Base
	- https://huggingface.co/tbilisi-ai-lab/kona2-12B-Instruct
	- Small 3.8B SLM
	- https://huggingface.co/tbilisi-ai-lab/kona2-small-3.8B
	- Datasets (selection)
	- Instruction (SFT) mix (2.61M pairs): https://huggingface.co/datasets/tbilisi-ai-lab/kona-sft-mix-2.6M
	- Function-calling SFT (EN 115k / KA 93k):
	- https://huggingface.co/datasets/tbilisi-ai-lab/kona-sft-function-calling-115k
	- https://huggingface.co/datasets/tbilisi-ai-lab/kona-sft-function-calling-ka-93k
	- Preference (DPO) mix (387k): https://huggingface.co/datasets/tbilisi-ai-lab/kona-dpo-mix-387k
	- Evaluation/Knowledge sets in Georgian (e.g., SuperGLUE-KA, BoolQ-KA, CommonsenseQA-KA, code-instruct-KA, wiki-QA-KA, human-translated EN↔KA, etc.):
	- Browse all: https://huggingface.co/tbilisi-ai-lab/datasets

	---

	## Why we exist

	- Language equity: Great AI shouldn’t be limited to high-resource languages.
	- Local impact: Better Georgian NLP improves education, services, accessibility, and economic opportunity.
	- Open science: We share models, data, and recipes so others can reproduce and build on our work.

	---

	## Quickstart (12B)

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

	model_id = "tbilisi-ai-lab/kona2-12B"
	tok = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype="auto")

	chat = pipeline("text-generation", model=model, tokenizer=tok)
	prompt = "გთხოვ, შეაჯამე ეს ტექსტი მოკლედ (ქართული): ქართული ენა უნიკალურია..."
	out = chat(prompt, max_new_tokens=256, do_sample=True, temperature=0.7)
	print(out[0]["generated_text"])