Update README.md
Browse files
README.md
CHANGED
|
@@ -13,7 +13,7 @@ pipeline_tag: translation
|
|
| 13 |
|
| 14 |
# Marco-MT-Algharb
|
| 15 |
|
| 16 |
-
This repository contains the system
|
| 17 |
|
| 18 |
## Introduction
|
| 19 |
|
|
@@ -37,7 +37,7 @@ The core of the process involves formatting the input text into a specific promp
|
|
| 37 |
The prompt template is:
|
| 38 |
|
| 39 |
```python
|
| 40 |
-
|
| 41 |
```
|
| 42 |
|
| 43 |
Here is a complete Python example:
|
|
@@ -81,7 +81,7 @@ prompts_to_generate = [prompt]
|
|
| 81 |
print("Formatted Prompt:\n", prompt)
|
| 82 |
|
| 83 |
sampling_params = SamplingParams(
|
| 84 |
-
n=
|
| 85 |
temperature=1.0,
|
| 86 |
top_p=1.0,
|
| 87 |
max_tokens=512
|
|
@@ -100,4 +100,10 @@ for output in outputs:
|
|
| 100 |
for i, candidate in enumerate(output.outputs):
|
| 101 |
generated_text = candidate.text.strip()
|
| 102 |
print(f"Candidate {i+1}: {generated_text}")
|
| 103 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
# Marco-MT-Algharb
|
| 15 |
|
| 16 |
+
This repository contains the system for Algharb, the submission from the Marco Translation Team of Alibaba International Digital Commerce (AIDC) to the WMT 2025 General Machine Translation Shared Task.
|
| 17 |
|
| 18 |
## Introduction
|
| 19 |
|
|
|
|
| 37 |
The prompt template is:
|
| 38 |
|
| 39 |
```python
|
| 40 |
+
"Human: Please translate the following text into {target_language}: \n{source_text}<|im_end|>\nAssistant:"
|
| 41 |
```
|
| 42 |
|
| 43 |
Here is a complete Python example:
|
|
|
|
| 81 |
print("Formatted Prompt:\n", prompt)
|
| 82 |
|
| 83 |
sampling_params = SamplingParams(
|
| 84 |
+
n=100,
|
| 85 |
temperature=1.0,
|
| 86 |
top_p=1.0,
|
| 87 |
max_tokens=512
|
|
|
|
| 100 |
for i, candidate in enumerate(output.outputs):
|
| 101 |
generated_text = candidate.text.strip()
|
| 102 |
print(f"Candidate {i+1}: {generated_text}")
|
| 103 |
+
```
|
| 104 |
+
|
| 105 |
+
### 3. Apply MBR decoding
|
| 106 |
+
```bash
|
| 107 |
+
comet-mbr -s src.txt -t mbr_sample_100.txt -o mbr_trans.txt --num_samples 100 --gpus 1 --qe_model Unbabel/wmt22-cometkiwi-da
|
| 108 |
+
```
|
| 109 |
+
Note: Word alignment for MBR reranking will be available soon.
|