File size: 12,410 Bytes
f934061 7a645ce 4bd6ae1 f934061 8a877e7 f934061 8a877e7 19a045c 918fbc2 19a045c ca2ad3d 19a045c 4bd6ae1 19a045c f934061 9c7beed f934061 9c7beed f934061 9c7beed 8a877e7 f934061 8a877e7 f934061 8a877e7 f934061 9c7beed f934061 8a877e7 1ed9cf4 f934061 cdf0a7b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 |
---
license: apache-2.0
base_model: microsoft/MiniLM-L6-v2
tags:
- transformers
- sentence-transformers
- sentence-similarity
- text-embeddings-inference
- information-retrieval
- knowledge-distillation
- transformers.js
language:
- en
---
<div style="display: flex; justify-content: center;">
<div style="display: flex; align-items: center; gap: 10px;">
<img src="logo.webp" alt="MongoDB Logo" style="height: 36px; width: auto; border-radius: 4px;">
<span style="font-size: 32px; font-weight: bold">MongoDB/mdbr-leaf-ir</span>
</div>
</div>
# Content
1. [Introduction](#introduction)
2. [Technical Report](#technical-report)
3. [Highlights](#highlights)
4. [Benchmarks](#benchmark-comparison)
5. [Quickstart](#quickstart)
6. [Citation](#citation)
# Introduction
`mdbr-leaf-ir` is a compact high-performance text embedding model specifically designed for **information retrieval (IR)** tasks, e.g., the retrieval stage of Retrieval-Augmented Generation (RAG) pipelines.
To enable even greater efficiency, `mdbr-leaf-ir` supports [flexible asymmetric architectures](#asymmetric-retrieval-setup) and is robust to [vector quantization](#vector-quantization) and [MRL truncation](#mrl-truncation).
If you are looking to perform other tasks such as classification, clustering, semantic sentence similarity, summarization, please check out our [`mdbr-leaf-mt`](https://huggingface.co/MongoDB/mdbr-leaf-mt) model.
> [!Note]
> **Note**: this model has been developed by the ML team of MongoDB Research. At the time of writing it is not used in any of MongoDB's commercial product or service offerings.
# Technical Report
A technical report detailing our proposed `LEAF` training procedure is [available here](https://arxiv.org/abs/2509.12539).
# Highlights
* **State-of-the-Art Performance**: `mdbr-leaf-ir` achieves state-of-the-art results for compact embedding models, **ranking #1** on the public [BEIR benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for models with ≤100M parameters.
* **Flexible Architecture Support**: `mdbr-leaf-ir` supports asymmetric retrieval architectures enabling even greater retrieval results. [See below](#asymmetric-retrieval-setup) for more information.
* **MRL and Quantization Support**: embedding vectors generated by `mdbr-leaf-ir` compress well when truncated (MRL) and can be stored using more efficient types like `int8` and `binary`. [See below](#mrl-truncation) for more information.
## Benchmark Comparison
The table below shows the average BEIR benchmark scores (nDCG@10) for `mdbr-leaf-ir` compared to other retrieval models.
`mdbr-leaf-ir` ranks #1 on the BEIR public leaderboard, and when run in asymmetric "**(asym.)**" mode as described [here](#asymmetric-retrieval-setup), the results improve even further.
| Model | Size | BEIR Avg. (nDCG@10) |
|------------------------------------|---------|----------------------|
| OpenAI text-embedding-3-large | Unknown | 55.43 |
| **mdbr-leaf-ir (asym.)** | 23M | **54.03** |
| **mdbr-leaf-ir** | 23M | **53.55** |
| snowflake-arctic-embed-s | 32M | 51.98 |
| bge-small-en-v1.5 | 33M | 51.65 |
| OpenAI text-embedding-3-small | Unknown | 51.08 |
| granite-embedding-small-english-r2 | 47M | 50.87 |
| snowflake-arctic-embed-xs | 23M | 50.15 |
| e5-small-v2 | 33M | 49.04 |
| SPLADE++ | 110M | 48.88 |
| MiniLM-L6-v2 | 23M | 41.95 |
| BM25 | – | 41.14 |
# Quickstart
## Sentence Transformers
```python
from sentence_transformers import SentenceTransformer
# Load the model
model = SentenceTransformer("MongoDB/mdbr-leaf-ir")
# Example queries and documents
queries = [
"What is machine learning?",
"How does neural network training work?"
]
documents = [
"Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",
"Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors."
]
# Encode queries and documents
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)
# Compute similarity scores
scores = model.similarity(query_embeddings, document_embeddings)
# Print results
for i, query in enumerate(queries):
print(f"Query: {query}")
for j, doc in enumerate(documents):
print(f" Similarity: {scores[i, j]:.4f} | Document {j}: {doc[:80]}...")
```
<details>
<summary>See example output</summary>
```
Query: What is machine learning?
Similarity: 0.6857 | Document 0: Machine learning is a subset of ...
Similarity: 0.4598 | Document 1: Neural networks are trained ...
Query: How does neural network training work?
Similarity: 0.4238 | Document 0: Machine learning is a subset of ...
Similarity: 0.5723 | Document 1: Neural networks are trained ...
```
</details>
## Transformers.js
If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
```bash
npm i @huggingface/transformers
```
You can then use the model to compute embeddings like this:
```js
import { AutoModel, AutoTokenizer, matmul } from "@huggingface/transformers";
// Download from the 🤗 Hub
const model_id = "MongoDB/mdbr-leaf-ir";
const tokenizer = await AutoTokenizer.from_pretrained(model_id);
const model = await AutoModel.from_pretrained(model_id, {
dtype: "fp32", // Options: "fp32" | "fp16" | "q8" | "q4" | "q4f16"
});
// Prepare queries and documents
const queries = [
"What is machine learning?",
"How does neural network training work?",
];
const documents = [
"Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",
"Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.",
];
const inputs = await tokenizer([
...queries.map((x) => "Represent this sentence for searching relevant passages: " + x),
...documents,
], { padding: true });
// Generate embeddings
const { sentence_embedding } = await model(inputs);
// Compute similarities
const scores = await matmul(
sentence_embedding.slice([0, queries.length]),
sentence_embedding.slice([queries.length, null]).transpose(1, 0),
);
const scores_list = scores.tolist();
for (let i = 0; i < queries.length; ++i) {
console.log(`Query: ${queries[i]}`);
for (let j = 0; j < documents.length; ++j) {
console.log(` Similarity: ${scores_list[i][j].toFixed(4)} | Document ${j}: ${documents[j]}`);
}
console.log();
}
```
<details>
<summary>See example output</summary>
```
Query: What is machine learning?
Similarity: 0.6857 | Document 0: Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.
Similarity: 0.4598 | Document 1: Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.
Query: How does neural network training work?
Similarity: 0.4238 | Document 0: Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.
Similarity: 0.5723 | Document 1: Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.
```
</details>
## Transformers Usage
See full example notebook [here](https://huggingface.co/MongoDB/mdbr-leaf-ir/blob/main/transformers_example.ipynb).
## Asymmetric Retrieval Setup
> [!Note]
> **Note**: a version of this asymmetric setup, conveniently packaged into a single model, is [available here](https://huggingface.co/MongoDB/mdbr-leaf-ir-asym).
`mdbr-leaf-ir` is *aligned* to [`snowflake-arctic-embed-m-v1.5`](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5), the model it has been distilled from. This enables flexible architectures in which, for example, documents are encoded using the larger model, while queries can be encoded faster and more efficiently with the compact `leaf` model:
```python
# Use mdbr-leaf-ir for query encoding (real-time, low latency)
query_model = SentenceTransformer("MongoDB/mdbr-leaf-ir")
query_embeddings = query_model.encode(queries, prompt_name="query")
# Use a larger model for document encoding (one-time, at index time)
doc_model = SentenceTransformer("Snowflake/snowflake-arctic-embed-m-v1.5")
document_embeddings = doc_model.encode(documents)
# Compute similarities
scores = query_model.similarity(query_embeddings, document_embeddings)
```
Retrieval results in asymmetric mode are often superior to the [standard mode above](#sentence-transformers).
## MRL Truncation
Embeddings have been trained via [MRL](https://arxiv.org/abs/2205.13147) and can be truncated for more efficient storage:
```python
query_embeds = model.encode(queries, prompt_name="query", truncate_dim=256)
doc_embeds = model.encode(documents, truncate_dim=256)
similarities = model.similarity(query_embeds, doc_embeds)
print('After MRL:')
print(f"* Embeddings dimension: {query_embeds.shape[1]}")
print(f"* Similarities: \n\t{similarities}")
```
<details>
<summary>See example output</summary>
```
After MRL:
* Embeddings dimension: 256
* Similarities:
tensor([[0.7136, 0.4989],
[0.4567, 0.6022]])
```
</details>
## Vector Quantization
Vector quantization, for example to `int8` or `binary`, can be performed as follows:
**Note**: For vector quantization to types other than binary, we suggest performing a calibration to determine the optimal ranges, [see here](https://sbert.net/examples/sentence_transformer/applications/embedding-quantization/README.html#scalar-int8-quantization).
Good initial values, according to the [teacher model's documentation](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5#compressing-to-128-bytes), are:
* `int8`: -0.3 and +0.3
* `int4`: -0.18 and +0.18
```python
from sentence_transformers.quantization import quantize_embeddings
import torch
query_embeds = model.encode(queries, prompt_name="query")
doc_embeds = model.encode(documents)
# Quantize embeddings to int8 using -0.3 and +0.3 as calibration ranges
ranges = torch.tensor([[-0.3], [+0.3]]).expand(2, query_embeds.shape[1]).cpu().numpy()
query_embeds = quantize_embeddings(query_embeds, "int8", ranges=ranges)
doc_embeds = quantize_embeddings(doc_embeds, "int8", ranges=ranges)
# Calculate similarities; cast to int64 to avoid under/overflow
similarities = query_embeds.astype(int) @ doc_embeds.astype(int).T
print('After quantization:')
print(f"* Embeddings type: {query_embeds.dtype}")
print(f"* Similarities: \n{similarities}")
```
<details>
<summary>See example output</summary>
```
After quantization:
* Embeddings type: int8
* Similarities:
[[118022 79111]
[ 72961 98333]]
```
</details>
## Evaluation
Please [see here](https://huggingface.co/MongoDB/mdbr-leaf-ir/blob/main/evaluate_models.ipynb).
# Citation
If you use this model in your work, please cite:
```bibtex
@misc{mdbr_leaf,
title={LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations},
author={Robin Vujanic and Thomas Rueckstiess},
year={2025},
eprint={2509.12539},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2509.12539},
}
```
# License
This model is released under Apache 2.0 License.
# Contact
For questions or issues, please open an issue or pull request. You can also contact the MongoDB ML research team at [email protected]. |