Open-Source AI Meetup

community

Activity Feed Request to join this org

AI & ML interests

Open science and open source

Recent Activity

osanseviero authored a paper about 1 month ago

EmbeddingGemma: Powerful and Lightweight Text Representations

ThankGod authored a paper about 2 months ago

DINOv3-Diffusion Policy: Self-Supervised Large Visual Model for Visuomotor Diffusion Policy Learning

rootacess authored a paper 4 months ago

Robust Learning of Diverse Code Edits

View all activity

himanshubeniwal

authored 6 papers about 1 month ago

Cross-lingual Editing in Multilingual Language Models

Paper • 2401.10521 • Published Jan 19, 2024 • 2

Remember This Event That Year? Assessing Temporal Information and Reasoning in Large Language Models

Paper • 2402.11997 • Published Feb 19, 2024 • 1

Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs

Paper • 2502.16901 • Published Feb 24

COMI-LINGUA: Expert Annotated Large-Scale Dataset for Multitask NLP in Hindi-English Code-Mixing

Paper • 2503.21670 • Published Mar 27

PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

Paper • 2504.04377 • Published Apr 6

COMMENTATOR: A Code-mixed Multilingual Text Annotation Framework

Paper • 2408.03125 • Published Aug 6, 2024

Teja-Gollapudi

authored a paper about 1 month ago

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Paper • 2509.25760 • Published Sep 30 • 54

osanseviero

authored a paper about 1 month ago

EmbeddingGemma: Powerful and Lightweight Text Representations

Paper • 2509.20354 • Published Sep 24 • 39

Teja-Gollapudi

authored 2 papers about 2 months ago

The Unreasonable Effectiveness of Eccentric Automatic Prompts

Paper • 2402.10949 • Published Feb 9, 2024 • 5

PrismRAG: Boosting RAG Factuality with Distractor Resilience and Strategized Reasoning

Paper • 2507.18857 • Published Jul 25 • 1

BrigitteTousi

posted an update 3 months ago

Post

768

On Wednesday, August 13 at 11am EDT, join @clem for a no bullshit AMA on Discord. Prep all your HF questions and meet us there! 🤗☄️⚡️

https://discord.com/invite/6r5TEXyk?event=1404451892179763311

BrigitteTousi

posted an update 3 months ago

Post

604

New interactive viz from AI World showing OpenAI's new open model gpt-oss-120b breaking into the top 50 most liked models of all time on the Hub in under a day! ☄️☄️☄️

BrigitteTousi

posted an update 4 months ago

Post

642

This is what Hugging Face is all about. We want everyone, hobbyists, researchers and industry alike, to be able to contribute to AI because everyone is affected by it. Kudos to HF's @irenesolaiman for spreading the word!🔥🤗

BrigitteTousi

posted an update 7 months ago

Post

3327

AI agents are transforming how we interact with technology, but how sustainable are they? 🌍

Design choices — like model size and structure — can massively impact energy use and cost. ⚡💰 The key takeaway: smaller, task-specific models can be far more efficient than large, general-purpose ones.

🔑 Open-source models offer greater transparency, allowing us to track energy consumption and make more informed decisions on deployment. 🌱 Open-source = more efficient, eco-friendly, and accountable AI.

Read our latest, led by @sasha with assists from myself + @yjernite 🤗
https://huggingface.co/blog/sasha/ai-agent-sustainability

1 reply

osanseviero

authored a paper 7 months ago

Gemma 3 Technical Report

Paper • 2503.19786 • Published Mar 25 • 54

BrigitteTousi

posted an update 8 months ago

Post

3454

LeRobot goes to driving school! 🚗🚗🚗

Hugging Face just announced a new collab with Yaak to bring the largest open-source self-driving dataset to LeRobot!

Major kudos to HF's @cadene , as well as @sandhawalia , @Shnissen and the Yaak team!

Check out the blog post here: https://huggingface.co/blog/lerobot-goes-to-driving-school

1 reply

BrigitteTousi

posted an update 8 months ago

Post

3753

Regardless of X being down or not, so glad I can rely on HF Posts for AI news ❤️🤗

1 reply

BrigitteTousi

posted an update 10 months ago

Post

1350

Community fine-tuned models are more carbon efficient than the models they are derived from! 🥳🌿

@alozowski @clefourrier @SaylorTwift @albertvillanova evaluated CO₂ emissions associated with model inference for over 3000 models on the Open LLM Leaderboard. Interesting trends and new insights emerged...👀

Blog Post: https://huggingface.co/blog/leaderboard-emissions-analysis

Leaderboard: open-llm-leaderboard/open_llm_leaderboard

BrigitteTousi

posted an update 12 months ago

Post

1153

I'm biased but I think HF Posts is the #1 social platform for the AI community! 🤗 That being said, most of us are already on X and now also joining Bluesky.

Looking for us on Bsky? We started a team list here: https://bsky.app/starter-pack/did:plc:yyfrnpcutxghwc6eac4xplwp/3lbem54cnxp26

mrm8488

posted an update over 1 year ago

Post

7642

🚨Exciting news for the Multilingual Synthetic Data Community!🚨

I’ve taken inspiration from the MAGPIE paper on Llama-3-8B-instruct and extended its capabilities. Here’s what’s new!

🗞 The MAGPIE paper showcased that if you use the instruction-tuned version (Llama-3-8B-instruct) to generate synthetic instructions and then fine-tune the base version (Llama-3-8B) on this dataset, you can improve even the it-tuned version

🤔 While reading a script by Sebastian Raschka, PhD, I wondered: Could these advancements be replicated in other languages? Specifically, could they benefit non-English datasets?

🎉 And the answer is YES! At least for Spanish. I've successfully adapted the techniques for Spanish, proving the model's flexibility and multilingual capabilities.

👩‍💻 To make this accessible, I created a basic script (heavily inspired by the Sebastian Raschka one) that allows you to generate similar datasets using ollama models (initially phi and llama3) automatically and upload it to the Hugging Face Hub!
[Script](https://gist.github.com/mrm8488/4650a5e3cc45523798a527a3446eb312)

🔍 Explore the datasets 📚 generated using our new script!

- [Llama-3-8B](https://huggingface.co/datasets/mrm8488/dataset_llama3_5000_samples_es_4231_filtered)
- [Phi-3-medium](https://huggingface.co/datasets/mrm8488/dataset_phi3-medium_5000_samples_es_3906_filtered)
- [Phi-3-mini](https://huggingface.co/datasets/mrm8488/dataset_phi3_5000_samples_es_3282_filtered)

Note: These datasets have basic filtering. Apply additional quality filters before using them to fine-tune large language models.

Inspiration and base script:
https://github.com/rasbt/LLMs-from-scratch/blob/main/ch07/05_dataset-generation/llama3-ollama.ipynb
https://www.linkedin.com/feed/update/urn:li:activity:7210982019751661568/

7 replies

AI & ML interests

Recent Activity

Team members 587

SFEvent's activity