ibm-granite
/

granite-embedding-reranker-english-r2

@@ -11,9 +11,10 @@ tags:
 - transformers
 - embeddings
 - mteb
 ---
-  # granite-embedding-reranker-english-r2
 <!-- Provide a quick summary of what the model is/does. -->
 **Model Summary:** _granite-embedding-reranker-english-r2_ is a 149M parameter dense cross-encoder model from the Granite Embeddings collection that can be used to generate high quality text embeddings. This model produces embedding vectors of size 768 based on context length of upto 8192 tokens. Compared to most other open-source models, this model was only trained using open-source relevance-pair datasets with permissive, enterprise-friendly license, plus IBM collected and generated datasets.
@@ -191,6 +192,40 @@ for doc, score in reranker_ranked:
     print(f"{score:.4f} | {doc}")
 ```
 ## Evaluation Results
 The performance of the Granite Embedding English reranking model on BEIR, MLDR, and Miracl benchmarks is reported below. All models are evaluated on the top-20 documents retrieved from the granite-embedding-english-small-r2 or granite-embedding-english-r2 retrievers respectively.

 - transformers
 - embeddings
 - mteb
+- text-embeddings-inference
 ---
+# granite-embedding-reranker-english-r2
 <!-- Provide a quick summary of what the model is/does. -->
 **Model Summary:** _granite-embedding-reranker-english-r2_ is a 149M parameter dense cross-encoder model from the Granite Embeddings collection that can be used to generate high quality text embeddings. This model produces embedding vectors of size 768 based on context length of upto 8192 tokens. Compared to most other open-source models, this model was only trained using open-source relevance-pair datasets with permissive, enterprise-friendly license, plus IBM collected and generated datasets.
     print(f"{score:.4f} | {doc}")
 ```
+**Usage with Hugging Face Text Embeddings Inference (TEI):**
+This is a simple example of how to deploy the reranking model with [Text Embeddings Inference (TEI)](https://github.com/huggingface/text-embeddings-inference), a blazing fast inference solution for text embedding models, via Docker.
+- On CPU:
+```bash
+docker run -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-latest --model-id ibm-granite/granite-embedding-reranker-english-r2
+```
+- On NVIDIA GPU:
+```bash
+docker run --gpus all -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cuda-latest --model-id ibm-granite/granite-embedding-reranker-english-r2
+```
+Then you can send requests to the deployed API via the `/rerank` route (see the [Text Embeddings Inference OpenAPI Specification](https://huggingface.github.io/text-embeddings-inference/) for more details):
+```bash
+curl http://0.0.0.0:8080/rerank \
+  -H "Content-Type: application/json" \
+  -d '{
+    "query": "what is the chemical formula of water?",
+    "texts": [
+      "Water is an inorganic compound with the chemical formula H2O.",
+      "In liquid form, H2O is also called '\''water'\'' at standard temperature and pressure.",
+      "The weather is nice today",
+      "Quick sort is a divide and conquer algorithm that sorts by partitioning."
+    ],
+    "raw_scores": false,
+    "return_text": false,
+    "truncate": true,
+    "truncation_direction": "Right"
+  }'
+```
 ## Evaluation Results
 The performance of the Granite Embedding English reranking model on BEIR, MLDR, and Miracl benchmarks is reported below. All models are evaluated on the top-20 documents retrieved from the granite-embedding-english-small-r2 or granite-embedding-english-r2 retrievers respectively.