|
|
--- |
|
|
library_name: transformers |
|
|
license: mit |
|
|
datasets: |
|
|
- AnkitSatpute/zbMath_allft |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- meta-llama/Llama-3.1-8B |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
|
|
|
# Model Card for Model ID |
|
|
|
|
|
The model is trained to generate document embeddings for math research papers and use the embeddings to find similar ranked documents. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
LLaMa model that is trained with contrastive learning for sequence classification. |
|
|
|
|
|
## How to use for generating embeddings |
|
|
|
|
|
from transformers import AutoTokenizer, AutoModel |
|
|
import torch |
|
|
|
|
|
sentences = ["Hello Who are you?", "I am fine thank you"] |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained('AnkitSatpute/Llama-3.1-ReRank-AllFTbtAbstr') |
|
|
|
|
|
model = AutoModel.from_pretrained('AnkitSatpute/Llama-3.1-ReRank-AllFTbtAbstr') |
|
|
|
|
|
model.eval() |
|
|
|
|
|
|
|
|
with torch.no_grad(): |
|
|
|
|
|
model_output = model(**encoded_input) |
|
|
|
|
|
sentence_embeddings = model_output[0][:, 0] |
|
|
|
|
|
|
|
|
sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1) |
|
|
|
|
|
print("Sentence embeddings:", sentence_embeddings) |
|
|
|
|
|
|
|
|
|