Open Legal Data Collection A collection of our favorite open-source legal datasets on Hugging Face. • 2 items • Updated 19 days ago • 4
Should We Still Pretrain Encoders with Masked Language Modeling? Paper • 2507.00994 • Published Jul 1 • 78
view article Article Training and Finetuning Sparse Embedding Models with Sentence Transformers v5 Jul 1 • 130
Zeroshot Classifiers Collection These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 12 items • Updated Jan 6 • 146
view article Article Multi-Label Classification Model From Scratch: Step-by-Step Tutorial Jan 8, 2024 • 47
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain Paper • 2407.19584 • Published Jul 28, 2024 • 66
Tajik Datasets Collection Datasets that have tajik subset or entirely tajik • 13 items • Updated Feb 20 • 4
Open Australian Legal Models Collection A collection of open source Australian legal language models • 6 items • Updated Jun 15, 2024 • 1
Open Australian Legal Data Collection A collection of open source Australian legal datasets • 3 items • Updated Jun 15, 2024 • 5