CrossEncoder based on sentence-transformers/all-mpnet-base-v2

This is a Cross Encoder model finetuned from sentence-transformers/all-mpnet-base-v2 using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("varadsrivastava/findocranker-mpnet-base-v2")
# Get scores for pairs of texts
pairs = [
    ['What did Fifth Third Bancorp’s leadership say about Fifth Third Bancorp’s dividend policy?', '[DOC=10-K | annual report | comprehensive business overview, risks, financials | 100-300 pages]'],
    ['What did Fifth Third Bancorp’s leadership say about Fifth Third Bancorp’s dividend policy?', '[DOC=10-Q | quarterly report | interim financials, MD&A updates | 30-60 pages]'],
    ['What did Fifth Third Bancorp’s leadership say about Fifth Third Bancorp’s dividend policy?', '[DOC=DEF-14A | proxy statement | governance, compensation, shareholder voting matters | annual filing]'],
    ['What did Fifth Third Bancorp’s leadership say about Fifth Third Bancorp’s dividend policy?', '[DOC=8-K | current report | material events, timely disclosures | ad-hoc filing]'],
    ['What did Fifth Third Bancorp’s leadership say about Fifth Third Bancorp’s dividend policy?', '[DOC=Earnings | earnings call transcript | forward guidance, Q&A, management commentary | quarterly]'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'What did Fifth Third Bancorp’s leadership say about Fifth Third Bancorp’s dividend policy?',
    [
        '[DOC=10-K | annual report | comprehensive business overview, risks, financials | 100-300 pages]',
        '[DOC=10-Q | quarterly report | interim financials, MD&A updates | 30-60 pages]',
        '[DOC=DEF-14A | proxy statement | governance, compensation, shareholder voting matters | annual filing]',
        '[DOC=8-K | current report | material events, timely disclosures | ad-hoc filing]',
        '[DOC=Earnings | earnings call transcript | forward guidance, Q&A, management commentary | quarterly]',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 3,943 training samples
  • Columns: query, docs, and labels
  • Approximate statistics based on the first 1000 samples:
    query docs labels
    type string list list
    details
    • min: 59 characters
    • mean: 104.63 characters
    • max: 181 characters
    • size: 5 elements
    • size: 5 elements
  • Samples:
    query docs labels
    What did Fifth Third Bancorp’s leadership say about Fifth Third Bancorp’s dividend policy? ['[DOC=10-K | annual report | comprehensive business overview, risks, financials | 100-300 pages]', '[DOC=10-Q | quarterly report | interim financials, MD&A updates | 30-60 pages]', '[DOC=DEF-14A | proxy statement | governance, compensation, shareholder voting matters | annual filing]', '[DOC=8-K | current report | material events, timely disclosures | ad-hoc filing]', '[DOC=Earnings | earnings call transcript | forward guidance, Q&A, management commentary | quarterly]'] [4, 3, 2, 1, 0]
    How did Qualcomm’s management describe forecasted capital allocation between developing new semiconductor technologies and potential acquisitions? ['[DOC=10-K | annual report | comprehensive business overview, risks, financials | 100-300 pages]', '[DOC=10-Q | quarterly report | interim financials, MD&A updates | 30-60 pages]', '[DOC=8-K | current report | material events, timely disclosures | ad-hoc filing]', '[DOC=DEF-14A | proxy statement | governance, compensation, shareholder voting matters | annual filing]', '[DOC=Earnings | earnings call transcript | forward guidance, Q&A, management commentary | quarterly]'] [4, 3, 2, 1, 0]
    What did GE HealthCare Technologies Inc.’s leadership say about GE HealthCare Technologies Inc.’s dividend policy? ['[DOC=10-K | annual report | comprehensive business overview, risks, financials | 100-300 pages]', '[DOC=8-K | current report | material events, timely disclosures | ad-hoc filing]', '[DOC=Earnings | earnings call transcript | forward guidance, Q&A, management commentary | quarterly]', '[DOC=10-Q | quarterly report | interim financials, MD&A updates | 30-60 pages]', '[DOC=DEF-14A | proxy statement | governance, compensation, shareholder voting matters | annual filing]'] [4, 3, 2, 1, 0]
  • Loss: ListNetLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "mini_batch_size": null
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 4
  • gradient_accumulation_steps: 4
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • num_train_epochs: 5
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • data_seed: 42
  • fp16: True
  • dataloader_num_workers: 2

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 4
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: 42
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 2
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.1014 25 1.6085
0.2028 50 1.5942
0.3043 75 1.4848
0.4057 100 1.405
0.5071 125 1.4059
0.6085 150 1.3635
0.7099 175 1.3535
0.8114 200 1.3472
0.9128 225 1.3368
1.0122 250 1.3291
1.1136 275 1.2947
1.2150 300 1.3202
1.3164 325 1.3245
1.4178 350 1.321
1.5193 375 1.298
1.6207 400 1.307
1.7221 425 1.325
1.8235 450 1.3332
1.9249 475 1.301
2.0243 500 1.3106
2.1258 525 1.2973
2.2272 550 1.2995
2.3286 575 1.2978
2.4300 600 1.3109
2.5314 625 1.298
2.6329 650 1.307
2.7343 675 1.2969
2.8357 700 1.2762
2.9371 725 1.2917
3.0365 750 1.2545
3.1379 775 1.271
3.2394 800 1.2609
3.3408 825 1.2694
3.4422 850 1.2906
3.5436 875 1.2951
3.6450 900 1.2852
3.7465 925 1.2788
3.8479 950 1.283
3.9493 975 1.2727
4.0487 1000 1.263
4.1501 1025 1.2662
4.2515 1050 1.2628
4.3529 1075 1.2511
4.4544 1100 1.2788
4.5558 1125 1.2671
4.6572 1150 1.2648
4.7586 1175 1.2694
4.8600 1200 1.2648
4.9615 1225 1.2678

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.1.0
  • Transformers: 4.56.1
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.10.1
  • Datasets: 4.0.0
  • Tokenizers: 0.22.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ListNetLoss

@inproceedings{cao2007learning,
    title={Learning to Rank: From Pairwise Approach to Listwise Approach},
    author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
    booktitle={Proceedings of the 24th international conference on Machine learning},
    pages={129--136},
    year={2007}
}
Downloads last month
4
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for varadsrivastava/findocranker-mpnet-base-v2

Finetuned
(312)
this model