--- title: README emoji: ๐Ÿ“š colorFrom: pink colorTo: yellow sdk: static pinned: true license: apache-2.0 --- # Tbilisi AI Lab **Open, Georgian-first Generative AI.** We are a **non-profit** on a mission to build capable, affordable, and open Georgian language models. Georgian is a **low-resource** language with ~4M speakers. We believe Georgian speakers deserve their own **ChatGPT-like, productivity-boosting** AI โ€” built *with* and *for* our community. --- ## ๐Ÿ”” Whatโ€™s new (Oct 2025) We have **open-sourced all our models and datasets โ€” from pretraining corpora to every stage of fine-tuning** (instruction/SFT, function-calling, and preference/DPO). Explore everything on our Hugging Face org: - Org: https://huggingface.co/tbilisi-ai-lab **Examples of whatโ€™s now public:** - **Models** - 12B family: Base + Instruct - https://huggingface.co/tbilisi-ai-lab/kona2-12B - https://huggingface.co/tbilisi-ai-lab/kona2-12B-Base - https://huggingface.co/tbilisi-ai-lab/kona2-12B-Instruct - **Small** 3.8B SLM - https://huggingface.co/tbilisi-ai-lab/kona2-small-3.8B - **Datasets (selection)** - **Instruction (SFT)** mix (2.61M pairs): https://huggingface.co/datasets/tbilisi-ai-lab/kona-sft-mix-2.6M - **Function-calling SFT** (EN 115k / KA 93k): - https://huggingface.co/datasets/tbilisi-ai-lab/kona-sft-function-calling-115k - https://huggingface.co/datasets/tbilisi-ai-lab/kona-sft-function-calling-ka-93k - **Preference (DPO)** mix (387k): https://huggingface.co/datasets/tbilisi-ai-lab/kona-dpo-mix-387k - **Evaluation/Knowledge** sets in Georgian (e.g., SuperGLUE-KA, BoolQ-KA, CommonsenseQA-KA, code-instruct-KA, wiki-QA-KA, human-translated ENโ†”KA, etc.): - Browse all: https://huggingface.co/tbilisi-ai-lab/datasets --- ## Why we exist - **Language equity:** Great AI shouldnโ€™t be limited to high-resource languages. - **Local impact:** Better Georgian NLP improves education, services, accessibility, and economic opportunity. - **Open science:** We share models, data, and recipes so others can reproduce and build on our work. --- ## Quickstart (12B) ```python from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline model_id = "tbilisi-ai-lab/kona2-12B" tok = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype="auto") chat = pipeline("text-generation", model=model, tokenizer=tok) prompt = "แƒ’แƒ—แƒฎแƒแƒ•, แƒจแƒ”แƒแƒฏแƒแƒ›แƒ” แƒ”แƒก แƒขแƒ”แƒฅแƒกแƒขแƒ˜ แƒ›แƒแƒ™แƒšแƒ”แƒ“ (แƒฅแƒแƒ แƒ—แƒฃแƒšแƒ˜): แƒฅแƒแƒ แƒ—แƒฃแƒšแƒ˜ แƒ”แƒœแƒ แƒฃแƒœแƒ˜แƒ™แƒแƒšแƒฃแƒ แƒ˜แƒ..." out = chat(prompt, max_new_tokens=256, do_sample=True, temperature=0.7) print(out[0]["generated_text"])