Add prompts to config (#3)
Browse files- add prompts (14917896b1fe196ef57efab5b1f0215673d5041f)
- Update config_sentence_transformers.json (79a7c744cfd0160a3988b824f0e409f23b053560)
- replace "retrieval" with "search" (dd0056fb02b3e4d3a506d12adc1ba811e01c3ae9)
Co-authored-by: Solomatin Roman <[email protected]>
- README.md +21 -1
- config_sentence_transformers.json +10 -0
README.md
CHANGED
|
@@ -1651,6 +1651,26 @@ print(sim_scores.diag().tolist())
|
|
| 1651 |
# [0.47968706488609314, 0.940900444984436, 0.7761018872261047]
|
| 1652 |
```
|
| 1653 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1654 |
## Citation
|
| 1655 |
|
| 1656 |
```
|
|
@@ -1667,4 +1687,4 @@ print(sim_scores.diag().tolist())
|
|
| 1667 |
|
| 1668 |
## Limitations
|
| 1669 |
|
| 1670 |
-
The model is designed to process texts in Russian, the quality in English is unknown. Maximum input text length is limited to 512 tokens.
|
|
|
|
| 1651 |
# [0.47968706488609314, 0.940900444984436, 0.7761018872261047]
|
| 1652 |
```
|
| 1653 |
|
| 1654 |
+
or using prompts (sentence-transformers>=2.4.0):
|
| 1655 |
+
|
| 1656 |
+
```python
|
| 1657 |
+
from sentence_transformers import SentenceTransformer
|
| 1658 |
+
|
| 1659 |
+
|
| 1660 |
+
# loads model with CLS pooling
|
| 1661 |
+
model = SentenceTransformer("ai-forever/ru-en-RoSBERTa")
|
| 1662 |
+
|
| 1663 |
+
classification = model.encode(["Он нам и <unk> не нужон ваш Интернет!", "What a time to be alive!"], prompt_name="classification")
|
| 1664 |
+
print(classification[0] @ classification[1].T) # 0.47968706488609314
|
| 1665 |
+
|
| 1666 |
+
clustering = model.encode(["В Ярославской области разрешили работу бань, но без посетителей", "Ярославским баням разрешили работать без посетителей"], prompt_name="clustering")
|
| 1667 |
+
print(clustering[0] @ clustering[1].T) # 0.940900444984436
|
| 1668 |
+
|
| 1669 |
+
query_embedding = model.encode("Сколько программистов нужно, чтобы вкрутить лампочку?", prompt_name="search_query")
|
| 1670 |
+
document_embedding = model.encode("Чтобы вкрутить лампочку, требуется три программиста: один напишет программу извлечения лампочки, другой — вкручивания лампочки, а третий проведет тестирование.", prompt_name="search_document")
|
| 1671 |
+
print(query_embedding @ document_embedding.T) # 0.7761018872261047
|
| 1672 |
+
```
|
| 1673 |
+
|
| 1674 |
## Citation
|
| 1675 |
|
| 1676 |
```
|
|
|
|
| 1687 |
|
| 1688 |
## Limitations
|
| 1689 |
|
| 1690 |
+
The model is designed to process texts in Russian, the quality in English is unknown. Maximum input text length is limited to 512 tokens.
|
config_sentence_transformers.json
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"prompts": {
|
| 3 |
+
"classification": "classification: ",
|
| 4 |
+
"search_query": "search_query: ",
|
| 5 |
+
"search_document": "search_document: ",
|
| 6 |
+
"clustering": "clustering: "
|
| 7 |
+
},
|
| 8 |
+
"default_prompt_name": null,
|
| 9 |
+
"similarity_fn_name": null
|
| 10 |
+
}
|