Synthetic baselines trained for our paper "Scaling Low-Resource MT via Synthetic Data Generation with LLMs" accepted as a main in EMNLP 2025.
AI & ML interests
At the University of Helsinki, we focus on: - NLP for morphologically-rich languages - Cross-lingual NLP - NLP in the humanities
Recent Activity
View all activity
multilingual translation models trained on the Tatoeba Translation Challenge dataset (from OPUS) and a massively multilingual Bible corpus
-
Helsinki-NLP/opus-mt-tc-bible-big-aav-fra_ita_por_spa
Translation • 0.2B • Updated • 20 • 2 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-en
Translation • 0.2B • Updated • 16 • 1 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-deu_eng_nld
Translation • 0.2B • Updated • 19 • 2 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-deu_eng_fra_por_spa
Translation • 0.2B • Updated • 6
MaLA-LM: Massive Language Adaptation of Large Language Models
Synthetic baselines trained for our paper "Scaling Low-Resource MT via Synthetic Data Generation with LLMs" accepted as a main in EMNLP 2025.
multilingual translation models trained on the Tatoeba Translation Challenge dataset (from OPUS) and a massively multilingual Bible corpus
-
Helsinki-NLP/opus-mt-tc-bible-big-aav-fra_ita_por_spa
Translation • 0.2B • Updated • 20 • 2 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-en
Translation • 0.2B • Updated • 16 • 1 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-deu_eng_nld
Translation • 0.2B • Updated • 19 • 2 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-deu_eng_fra_por_spa
Translation • 0.2B • Updated • 6
Open Parallel Corpus
MaLA-LM: Massive Language Adaptation of Large Language Models