Spaces:
Running
Running
| title: README | |
| emoji: ๐ | |
| colorFrom: pink | |
| colorTo: yellow | |
| sdk: static | |
| pinned: true | |
| license: apache-2.0 | |
| # Tbilisi AI Lab | |
| **Open, Georgian-first Generative AI.** | |
| We are a **non-profit** on a mission to build capable, affordable, and open Georgian language models. | |
| Georgian is a **low-resource** language with ~4M speakers. We believe Georgian speakers deserve their own **ChatGPT-like, productivity-boosting** AI โ built *with* and *for* our community. | |
| --- | |
| ## ๐ Whatโs new (Oct 2025) | |
| We have **open-sourced all our models and datasets โ from pretraining corpora to every stage of fine-tuning** (instruction/SFT, function-calling, and preference/DPO). Explore everything on our Hugging Face org: | |
| - Org: https://huggingface.co/tbilisi-ai-lab | |
| **Examples of whatโs now public:** | |
| - **Models** | |
| - 12B family: Base + Instruct | |
| - https://huggingface.co/tbilisi-ai-lab/kona2-12B | |
| - https://huggingface.co/tbilisi-ai-lab/kona2-12B-Base | |
| - https://huggingface.co/tbilisi-ai-lab/kona2-12B-Instruct | |
| - **Small** 3.8B SLM | |
| - https://huggingface.co/tbilisi-ai-lab/kona2-small-3.8B | |
| - **Datasets (selection)** | |
| - **Instruction (SFT)** mix (2.61M pairs): https://huggingface.co/datasets/tbilisi-ai-lab/kona-sft-mix-2.6M | |
| - **Function-calling SFT** (EN 115k / KA 93k): | |
| - https://huggingface.co/datasets/tbilisi-ai-lab/kona-sft-function-calling-115k | |
| - https://huggingface.co/datasets/tbilisi-ai-lab/kona-sft-function-calling-ka-93k | |
| - **Preference (DPO)** mix (387k): https://huggingface.co/datasets/tbilisi-ai-lab/kona-dpo-mix-387k | |
| - **Evaluation/Knowledge** sets in Georgian (e.g., SuperGLUE-KA, BoolQ-KA, CommonsenseQA-KA, code-instruct-KA, wiki-QA-KA, human-translated ENโKA, etc.): | |
| - Browse all: https://huggingface.co/tbilisi-ai-lab/datasets | |
| --- | |
| ## Why we exist | |
| - **Language equity:** Great AI shouldnโt be limited to high-resource languages. | |
| - **Local impact:** Better Georgian NLP improves education, services, accessibility, and economic opportunity. | |
| - **Open science:** We share models, data, and recipes so others can reproduce and build on our work. | |
| --- | |
| ## Quickstart (12B) | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline | |
| model_id = "tbilisi-ai-lab/kona2-12B" | |
| tok = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype="auto") | |
| chat = pipeline("text-generation", model=model, tokenizer=tok) | |
| prompt = "แแแฎแแ, แจแแแฏแแแ แแก แขแแฅแกแขแ แแแแแแ (แฅแแ แแฃแแ): แฅแแ แแฃแแ แแแ แฃแแแแแแฃแ แแ..." | |
| out = chat(prompt, max_new_tokens=256, do_sample=True, temperature=0.7) | |
| print(out[0]["generated_text"]) |