Update README.md
Browse files
README.md
CHANGED
|
@@ -5,7 +5,7 @@ license: mit
|
|
| 5 |
These models are created from their respective IndicTrans2 parent versions by simplying replacing the Sinusoidal Positional Embedding with Rotary Positional Embedding ([Su _et al._](https://arxiv.org/abs/2104.09864)), and finetuning them for further alignment.
|
| 6 |
|
| 7 |
*NOTE*:
|
| 8 |
-
These models are my independent reproduction of the paper: [Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models](https://
|
| 9 |
|
| 10 |
Detailed information on the data mixture, hyperparameters, and training curriculum can be found in the paper.
|
| 11 |
|
|
@@ -17,7 +17,7 @@ The usage instructions are very similar to [IndicTrans2 HuggingFace models](http
|
|
| 17 |
```python
|
| 18 |
import torch
|
| 19 |
import warnings
|
| 20 |
-
from IndicTransToolkit import IndicProcessor
|
| 21 |
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
| 22 |
|
| 23 |
warnings.filterwarnings("ignore")
|
|
@@ -67,19 +67,27 @@ print(" | > Translations:", outputs[0])
|
|
| 67 |
If you use these models directly or fine-tune them further for additional use cases, please cite the following work:
|
| 68 |
|
| 69 |
```bibtex
|
| 70 |
-
@
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
}
|
| 79 |
```
|
| 80 |
|
| 81 |
# Note
|
| 82 |
-
These new and improved models are primarily built and tested for document-level and long-context translations, and the performance of smaller sentence-level tasks might be sub-optimal, and might require generation parameter tuning. Please throughly verify the performance of the models for your usecase before scaling up generation.
|
| 83 |
|
| 84 |
# Warning
|
| 85 |
Occasionally, you may notice some variation in the output, which may not be optimal. In such cases, you can experiment with adjusting the `num_beams`, `repetition_penalty`, and `length_penalty` parameters in the `generation_config`. Based on standard testing, the example with an input size of 1457 can be run on a single A100 GPU. However, the 1B model might require more compute resources or a lower beam size for generation.
|
|
|
|
| 5 |
These models are created from their respective IndicTrans2 parent versions by simplying replacing the Sinusoidal Positional Embedding with Rotary Positional Embedding ([Su _et al._](https://arxiv.org/abs/2104.09864)), and finetuning them for further alignment.
|
| 6 |
|
| 7 |
*NOTE*:
|
| 8 |
+
These models are my independent reproduction of the paper: [Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models](https://aclanthology.org/2025.naacl-long.366/).
|
| 9 |
|
| 10 |
Detailed information on the data mixture, hyperparameters, and training curriculum can be found in the paper.
|
| 11 |
|
|
|
|
| 17 |
```python
|
| 18 |
import torch
|
| 19 |
import warnings
|
| 20 |
+
from IndicTransToolkit.processor import IndicProcessor
|
| 21 |
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
| 22 |
|
| 23 |
warnings.filterwarnings("ignore")
|
|
|
|
| 67 |
If you use these models directly or fine-tune them further for additional use cases, please cite the following work:
|
| 68 |
|
| 69 |
```bibtex
|
| 70 |
+
@inproceedings{gumma-etal-2025-towards,
|
| 71 |
+
title = "Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models",
|
| 72 |
+
author = "Gumma, Varun and
|
| 73 |
+
Chitale, Pranjal A and
|
| 74 |
+
Bali, Kalika",
|
| 75 |
+
editor = "Chiruzzo, Luis and
|
| 76 |
+
Ritter, Alan and
|
| 77 |
+
Wang, Lu",
|
| 78 |
+
booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
|
| 79 |
+
month = apr,
|
| 80 |
+
year = "2025",
|
| 81 |
+
address = "Albuquerque, New Mexico",
|
| 82 |
+
publisher = "Association for Computational Linguistics",
|
| 83 |
+
url = "https://aclanthology.org/2025.naacl-long.366/",
|
| 84 |
+
pages = "7158--7170",
|
| 85 |
+
ISBN = "979-8-89176-189-6"
|
| 86 |
}
|
| 87 |
```
|
| 88 |
|
| 89 |
# Note
|
| 90 |
+
These new and improved models are primarily built and tested for document-level and long-context translations, and the performance of smaller sentence-level tasks might be slightly sub-optimal, and might require generation parameter tuning. Please throughly verify the performance of the models for your usecase before scaling up generation.
|
| 91 |
|
| 92 |
# Warning
|
| 93 |
Occasionally, you may notice some variation in the output, which may not be optimal. In such cases, you can experiment with adjusting the `num_beams`, `repetition_penalty`, and `length_penalty` parameters in the `generation_config`. Based on standard testing, the example with an input size of 1457 can be run on a single A100 GPU. However, the 1B model might require more compute resources or a lower beam size for generation.
|