everything for high quality filtering of HPLT3
JQL-AI
community
AI & ML interests
None defined yet.
Recent Activity
Fineweb 2 (removed / filtered) embeddings with Snowflake's Arctic-embed-m-v2.0.
Collection of curated datasets embedded with Snowflake's Arctic-embed-m-v2.0.
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models
-
5
JQL: Judging Quality Across Languages
🦊Filter multilingual data for high-quality language models
-
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models
Paper • 2505.22232 • Published • 18 -
JQL-AI/JQL-Edu-Heads
Text Ranking • Updated • 2 -
JQL-AI/JQL-LLM-Edu-Annotations
Viewer • Updated • 11.4M • 939 • 2
HPLT2-dedup embeddings from Snowflake's Arctic-embed-m-v2.0 model
everything for high quality filtering of HPLT3
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models
-
5
JQL: Judging Quality Across Languages
🦊Filter multilingual data for high-quality language models
-
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models
Paper • 2505.22232 • Published • 18 -
JQL-AI/JQL-Edu-Heads
Text Ranking • Updated • 2 -
JQL-AI/JQL-LLM-Edu-Annotations
Viewer • Updated • 11.4M • 939 • 2
Fineweb 2 (removed / filtered) embeddings with Snowflake's Arctic-embed-m-v2.0.
HPLT2-dedup embeddings from Snowflake's Arctic-embed-m-v2.0 model
Collection of curated datasets embedded with Snowflake's Arctic-embed-m-v2.0.