TurkEmbed4STS / README.md
ozayezerceli's picture
Upload 10 files
813afca verified
metadata
language:
  - tr
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:482091
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
  - loss:CoSENTLoss
base_model: Alibaba-NLP/gte-multilingual-base
widget:
  - source_sentence: >-
      Ya da dışarı çıkıp yürü ya da biraz koşun. Bunu düzenli olarak yapmıyorum
      ama Washington bunu yapmak için harika bir yer.
    sentences:
      - “Washington's yürüyüş ya da koşu için harika bir yer.”
      - H-2A uzaylılar Amerika Birleşik Devletleri'nde zaman kısa süreleri var.
      - “Washington'da düzenli olarak yürüyüşe ya da koşuya çıkıyorum.”
  - source_sentence: >-
      Orta yaylalar ve güney kıyıları arasındaki kontrast daha belirgin
      olamazdı.
    sentences:
      - >-
        İşitme Yardımı Uyumluluğu Müzakere Kuralları Komitesi, Federal İletişim
        Komisyonu'nun bir ürünüdür.
      - Dağlık ve sahil arasındaki kontrast kolayca işaretlendi.
      - Kontrast işaretlenemedi.
  - source_sentence: >-
      Bir 1997 Henry J. Kaiser Aile Vakfı anket yönetilen bakım planlarında
      Amerikalılar temelde kendi bakımı ile memnun olduğunu bulundu.
    sentences:
      - Kaplanları takip ederken çok sessiz olmalısın.
      - >-
        Henry Kaiser vakfı insanların sağlık hizmetlerinden hoşlandığını
        gösteriyor.
      - >-
        Henry Kaiser Vakfı insanların sağlık hizmetlerinden nefret ettiğini
        gösteriyor.
  - source_sentence: Eminim yapmışlardır.
    sentences:
      - Eminim öyle yapmışlardır.
      - Batı Teksas'ta 100 10 dereceydi.
      - Eminim yapmamışlardır.
  - source_sentence: >-
      Ve gerçekten, baba haklıydı, oğlu zaten her şeyi tecrübe etmişti, her şeyi
      denedi ve daha az ilgileniyordu.
    sentences:
      - Oğlu her şeye olan ilgisini kaybediyordu.
      - Pek bir şey yapmadım.
      - Baba oğlunun tecrübe için hala çok şey olduğunu biliyordu.
datasets:
  - emrecan/all-nli-tr
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy
  - pearson_cosine
  - spearman_cosine
model-index:
  - name: SentenceTransformer based on Alibaba-NLP/gte-multilingual-base
    results:
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: all nli tr test
          type: all-nli-tr-test
        metrics:
          - type: cosine_accuracy
            value: 0.8966145437983908
            name: Cosine Accuracy
          - type: cosine_accuracy
            value: 0.9351753453772582
            name: Cosine Accuracy
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.8043925123766598
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.804133282756889
            name: Spearman Cosine
          - type: pearson_cosine
            value: 0.8133873820848544
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8199552151367876
            name: Spearman Cosine
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts22 test
          type: sts22-test
        metrics:
          - type: pearson_cosine
            value: 0.647912337747937
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.6694072470896322
            name: Spearman Cosine
          - type: pearson_cosine
            value: 0.6514085062457564
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.6827342891126081
            name: Spearman Cosine
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev gte multilingual base
          type: sts-dev-gte-multilingual-base
        metrics:
          - type: pearson_cosine
            value: 0.838717139426684
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8428367492381358
            name: Spearman Cosine
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test gte multilingual base
          type: sts-test-gte-multilingual-base
        metrics:
          - type: pearson_cosine
            value: 0.8133873820848544
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8199552151367876
            name: Spearman Cosine
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: stsb dev 768
          type: stsb-dev-768
        metrics:
          - type: pearson_cosine
            value: 0.870311456444647
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8747522169942328
            name: Spearman Cosine
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: stsb dev 512
          type: stsb-dev-512
        metrics:
          - type: pearson_cosine
            value: 0.8696934286998554
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8753487201891684
            name: Spearman Cosine
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: stsb dev 256
          type: stsb-dev-256
        metrics:
          - type: pearson_cosine
            value: 0.8644706498119142
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.873468734899321
            name: Spearman Cosine
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: stsb dev 128
          type: stsb-dev-128
        metrics:
          - type: pearson_cosine
            value: 0.8591309130178328
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8700377378574327
            name: Spearman Cosine
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: stsb dev 64
          type: stsb-dev-64
        metrics:
          - type: pearson_cosine
            value: 0.8479124810212979
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8655596653561272
            name: Spearman Cosine
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: stsb test 768
          type: stsb-test-768
        metrics:
          - type: pearson_cosine
            value: 0.8455412308380735
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8535290217691063
            name: Spearman Cosine
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: stsb test 512
          type: stsb-test-512
        metrics:
          - type: pearson_cosine
            value: 0.8464773608783734
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8553900248212041
            name: Spearman Cosine
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: stsb test 256
          type: stsb-test-256
        metrics:
          - type: pearson_cosine
            value: 0.8443046458551826
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8550098621393595
            name: Spearman Cosine
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: stsb test 128
          type: stsb-test-128
        metrics:
          - type: pearson_cosine
            value: 0.8363964421208214
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8511193715667303
            name: Spearman Cosine
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: stsb test 64
          type: stsb-test-64
        metrics:
          - type: pearson_cosine
            value: 0.8235450515966374
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8460761238725121
            name: Spearman Cosine

SentenceTransformer based on Alibaba-NLP/gte-multilingual-base

This is a sentence-transformers model finetuned from Alibaba-NLP/gte-multilingual-base on the all-nli-tr dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Alibaba-NLP/gte-multilingual-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: tr

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NewModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Ve gerçekten, baba haklıydı, oğlu zaten her şeyi tecrübe etmişti, her şeyi denedi ve daha az ilgileniyordu.',
    'Oğlu her şeye olan ilgisini kaybediyordu.',
    'Baba oğlunun tecrübe için hala çok şey olduğunu biliyordu.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.8966

Semantic Similarity

  • Datasets: sts-test, sts22-test, sts-dev-gte-multilingual-base, sts-test-gte-multilingual-base, sts-test, sts22-test, stsb-dev-768, stsb-dev-512, stsb-dev-256, stsb-dev-128, stsb-dev-64, stsb-test-768, stsb-test-512, stsb-test-256, stsb-test-128 and stsb-test-64
  • Evaluated with EmbeddingSimilarityEvaluator
Metric sts-test sts22-test sts-dev-gte-multilingual-base sts-test-gte-multilingual-base stsb-dev-768 stsb-dev-512 stsb-dev-256 stsb-dev-128 stsb-dev-64 stsb-test-768 stsb-test-512 stsb-test-256 stsb-test-128 stsb-test-64
pearson_cosine 0.8134 0.6514 0.8387 0.8134 0.8703 0.8697 0.8645 0.8591 0.8479 0.8455 0.8465 0.8443 0.8364 0.8235
spearman_cosine 0.82 0.6827 0.8428 0.82 0.8748 0.8753 0.8735 0.87 0.8656 0.8535 0.8554 0.855 0.8511 0.8461

Triplet

Metric Value
cosine_accuracy 0.9352

Training Details

Training Dataset

all-nli-tr

  • Dataset: all-nli-tr at daeabfb
  • Size: 482,091 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 10.51 tokens
    • max: 27 tokens
    • min: 6 tokens
    • mean: 10.47 tokens
    • max: 27 tokens
    • min: 0.0
    • mean: 2.23
    • max: 5.0
  • Samples:
    sentence1 sentence2 score
    Bir uçak kalkıyor. Bir hava uçağı kalkıyor. 5.0
    Bir adam büyük bir flüt çalıyor. Bir adam flüt çalıyor. 3.8
    Bir adam pizzaya rendelenmiş peynir yayıyor. Bir adam pişmemiş pizzaya rendelenmiş peynir yayıyor. 3.8
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "CoSENTLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Evaluation Dataset

all-nli-tr

  • Dataset: all-nli-tr at daeabfb
  • Size: 6,567 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 15.89 tokens
    • max: 39 tokens
    • min: 6 tokens
    • mean: 16.02 tokens
    • max: 49 tokens
    • min: 0.0
    • mean: 2.1
    • max: 5.0
  • Samples:
    sentence1 sentence2 score
    Şapkalı bir adam dans ediyor. Sert şapka takan bir adam dans ediyor. 5.0
    Küçük bir çocuk ata biniyor. Bir çocuk ata biniyor. 4.75
    Bir adam yılana fare yediriyor. Adam yılana fare yediriyor. 5.0
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "CoSENTLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • num_train_epochs: 10
  • warmup_ratio: 0.1
  • warmup_steps: 144
  • bf16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 144
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss all-nli-tr-test_cosine_accuracy sts-test_spearman_cosine sts22-test_spearman_cosine sts-dev-gte-multilingual-base_spearman_cosine sts-test-gte-multilingual-base_spearman_cosine stsb-dev-768_spearman_cosine stsb-dev-512_spearman_cosine stsb-dev-256_spearman_cosine stsb-dev-128_spearman_cosine stsb-dev-64_spearman_cosine stsb-test-768_spearman_cosine stsb-test-512_spearman_cosine stsb-test-256_spearman_cosine stsb-test-128_spearman_cosine stsb-test-64_spearman_cosine
0 0 - - 0.8966 0.8041 0.6694 - - - - - - - - - - - -
0.1327 1000 2.5299 3.3893 - - - 0.8318 - - - - - - - - - - -
0.2655 2000 2.1132 3.3050 - - - 0.8345 - - - - - - - - - - -
0.3982 3000 5.1488 2.7752 - - - 0.8481 - - - - - - - - - - -
0.5310 4000 5.4103 2.7242 - - - 0.8445 - - - - - - - - - - -
0.6637 5000 5.1896 2.6701 - - - 0.8451 - - - - - - - - - - -
0.7965 6000 5.0105 2.6489 - - - 0.8431 - - - - - - - - - - -
0.9292 7000 5.1059 2.6114 - - - 0.8428 - - - - - - - - - - -
1.0 7533 - - 0.9352 0.8200 0.6827 - 0.8200 - - - - - - - - - -
1.1111 200 34.2828 29.8737 - - - - - 0.8671 0.8671 0.8639 0.8606 0.8546 - - - - -
2.2222 400 28.038 28.8915 - - - - - 0.8740 0.8742 0.8720 0.8691 0.8648 - - - - -
3.3333 600 27.3829 29.3391 - - - - - 0.8747 0.8751 0.8728 0.8699 0.8653 - - - - -
4.4444 800 26.807 30.0090 - - - - - 0.8756 0.8761 0.8741 0.8710 0.8665 - - - - -
5.5556 1000 26.4543 30.5886 - - - - - 0.8753 0.8757 0.8739 0.8705 0.8662 - - - - -
6.6667 1200 26.0413 31.3750 - - - - - 0.8744 0.8751 0.8730 0.8698 0.8655 - - - - -
7.7778 1400 25.8221 31.6515 - - - - - 0.8752 0.8758 0.8739 0.8706 0.8661 - - - - -
8.8889 1600 25.6656 31.9805 - - - - - 0.8746 0.8752 0.8733 0.8700 0.8655 - - - - -
10.0 1800 25.5355 32.0454 - - - - - 0.8748 0.8753 0.8735 0.8700 0.8656 0.8535 0.8554 0.8550 0.8511 0.8461

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.3.1
  • Transformers: 4.49.0.dev0
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}