--- language: - multilingual - bg - en - fr - de - ru - es - sw - tr - vi tags: - deberta - deberta-v3 - mdeberta license: mit --- # mdeberta-v3-base-lite This model was created through vocabulary pruning of the original [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) model while maintaining full quality for Latin and Cyrillic-based languages. ## Supported Languages - Bulgarian - English - French - German - Russian - Spanish - Swahili - Turkish - Vietnamese ## Usage ```python from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("rustemgareev/mdeberta-v3-base-lite") model = AutoModel.from_pretrained("rustemgareev/mdeberta-v3-base-lite") # Example usage text = "This is an example text in English." inputs = tokenizer(text, return_tensors="pt") outputs = model(**inputs) ``` ## Performance Evaluation ### Size Comparison | Metric | Original Model | Lite Model | Reduction | |--------|----------------|------------|-----------| | Vocabulary Size | 250,102 tokens | 163,211 tokens | 34.74% | | Disk Size | 1.06 GB | 817 MB | 23.23% | ### VRAM Usage Comparison *Estimated using [Hugging Face Accelerate Model Estimator](https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator).* | Metric | Original Model | Lite Model | Reduction | |--------|----------------|------------|-----------| | Largest Layer (float32) | 735.35 MB | 478.16 MB | 34.99% | | Total Size (float32) | 1.04 GB | 804.13 MB | 22.68% | | Training using Adam (Peak vRAM) | 4.15 GB | 3.14 GB | 24.34% | ### Semantic Similarity Comparison **Evaluation Method**: Cosine similarity between embeddings of parallel sentences in different languages, using English as reference. **Test Phrases Used**: - English: "Artificial intelligence learns to understand human languages and helps people communicate." - Bulgarian: "Изкуственият интелект се учи да разбира човешките езици и помага на хората да общуват." - French: "L'intelligence artificielle apprend à comprendre les langages humains et aide les gens à communiquer." - German: "Künstliche Intelligenz lernt, menschliche Sprachen zu verstehen und hilft Menschen bei der Kommunikation." - Russian: "Искусственный интеллект учится понимать человеческие языки и помогает людям общаться." - Spanish: "La inteligencia artificial aprende a entender los idiomas humanos y ayuda a las personas a comunicarse." - Swahili: "Akili ya kisasa inajifunza kuelewa lugha za wanadamu na kusaidia watu kuwasiliana." - Turkish: "Yapay zeka, insan dillerini anlamayı öğrenir ve insanların iletişim kurmasına yardımcı olur." - Vietnamese: "Trí tuệ nhân tạo học cách hiểu ngôn ngữ con người và giúp mọi người giao tiếp." **Similarity Results**: | Language Pair | Original Similarity | Lite Similarity | Difference | |---------------|-----------------|-----------------|------------| | English-Bulgarian | 0.9276 | 0.9276 | 0.0000 | | English-French | 0.9322 | 0.9322 | 0.0000 | | English-German | 0.9178 | 0.9178 | 0.0000 | | English-Russian | 0.9335 | 0.9335 | 0.0000 | | English-Spanish | 0.9228 | 0.9228 | 0.0000 | | English-Swahili | 0.9591 | 0.9591 | 0.0000 | | English-Turkish | 0.9450 | 0.9450 | 0.0000 | | English-Vietnamese | 0.7955 | 0.7955 | 0.0000 | ## License This model is distributed under the [MIT License](https://opensource.org/licenses/MIT).