AIDC-AI
/

Marco-MT-Algharb

Translation

Safetensors

qwen3

Model card Files Files and versions

xet

Community

hwang233 commited on Sep 23

Commit

5fec798

verified ·

1 Parent(s): a4e599b

Update README.md

Browse files

Files changed (1) hide show

README.md +5 -12

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ This repository contains the system description paper for Algharb, the submissio
 ## Introduction
-The Algharb system is a large translation model built based on the Qwen3-14B foundation. It is designed for high-quality translation across 13 diverse language directions and demonstrates state-of-the-art performance. Our approach is centered on a multi-stage refinement pipeline that systematically enhances translation fluency and faithfulness. In the WMT 2025 evaluation, Algharb significantly outperformed strong proprietary models like GPT-4o and Claude 3.7 Sonnet, achieving the top score in every submitted language pair.
 ## Usage
@@ -45,7 +45,6 @@ Here is a complete Python example:
 from vllm import LLM, SamplingParams
 # --- 1. Load Model and Tokenizer ---
-# Replace with the actual path to your fine-tuned Algharb model
 model_path = "path/to/your/algharb_model"
 llm = LLM(model=model_path)
@@ -59,7 +58,7 @@ lang_name_map = {
     "zh_CN": "chinese",
     "ko_KR": "korean",
     "ja_JP": "japanese",
-    "ar_EG": "arabic", # Note: paper uses 'arz', this might need adjustment
     "cs_CZ": "czech",
     "ru_RU": "russian",
     "uk_UA": "ukraine",
@@ -74,21 +73,18 @@ target_language_name = lang_name_map.get(target_lang_code, "the target language"
 # --- 3. Construct the Prompt ---
 prompt = (
     f"Human: Please translate the following text into {target_language_name}: \n"
-    f"{source_text}&lt;|im_end|&gt;\n"
     f"Assistant:"
 )
 prompts_to_generate = [prompt]
 print("Formatted Prompt:\n", prompt)
-# --- 4. Configure Sampling Parameters for MBR ---
-# We generate n candidates for our hybrid MBR decoding.
-# The script uses temperature=1 for diverse sampling.
 sampling_params = SamplingParams(
-    n=10,  # Number of candidate translations to generate
     temperature=1.0,
     top_p=1.0,
-    max_tokens=512  # Adjust as needed
 )
 # --- 5. Generate Translations ---
@@ -104,7 +100,4 @@ for output in outputs:
     for i, candidate in enumerate(output.outputs):
         generated_text = candidate.text.strip()
         print(f"Candidate {i+1}: {generated_text}")
-# The generated candidates can now be passed to the
-# hybrid MBR re-ranking process described in the paper.
 ```

 ## Introduction
+The Algharb system is a large translation model built based on the Qwen3-14B foundation. It is designed for high-quality translation across 13 diverse language directions and demonstrates state-of-the-art performance. Our approach is centered on a multi-stage refinement pipeline that systematically enhances translation fluency and faithfulness.
 ## Usage
 from vllm import LLM, SamplingParams
 # --- 1. Load Model and Tokenizer ---
 model_path = "path/to/your/algharb_model"
 llm = LLM(model=model_path)
     "zh_CN": "chinese",
     "ko_KR": "korean",
     "ja_JP": "japanese",
+    "ar_EG": "arabic",
     "cs_CZ": "czech",
     "ru_RU": "russian",
     "uk_UA": "ukraine",
 # --- 3. Construct the Prompt ---
 prompt = (
     f"Human: Please translate the following text into {target_language_name}: \n"
+    f"{source_text}<|im_end|>\n"
     f"Assistant:"
 )
 prompts_to_generate = [prompt]
 print("Formatted Prompt:\n", prompt)
 sampling_params = SamplingParams(
+    n=1,
     temperature=1.0,
     top_p=1.0,
+    max_tokens=512
 )
 # --- 5. Generate Translations ---
     for i, candidate in enumerate(output.outputs):
         generated_text = candidate.text.strip()
         print(f"Candidate {i+1}: {generated_text}")
 ```