AI & ML interests

https://github.com/huggingface/cookbook

sergiopaniegoΒ 
posted an update 5 days ago
sergiopaniegoΒ 
posted an update 6 days ago
sergiopaniegoΒ 
posted an update 9 days ago
sergiopaniegoΒ 
posted an update 11 days ago
sergiopaniegoΒ 
posted an update 17 days ago
view post
Post
2786
Meet OpenEnv πŸ‘‹, an open ecosystem of environments for intelligent agents. Build, share, and test agents safely and consistently.

Ideal for training with TRL (we include examplesπŸ€“), deployment, and community collaboration via the HF Hub

Blog: https://huggingface.co/blog/openenv
Hub for Environments: openenv
OpenEnv repo: https://github.com/meta-pytorch/OpenEnv
Try it out using TRL: https://huggingface.co/docs/trl/main/en/openenv
  • 1 reply
Β·
merveΒ 
posted an update 20 days ago
view post
Post
5094
deepseek-ai/DeepSeek-OCR is out! πŸ”₯ my take ‡️
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
Β·
sergiopaniegoΒ 
posted an update 23 days ago
view post
Post
1919
New drop! πŸ’₯ The VLM Object Understanding Comparison Space now runs with Qwen3-VL-4B and moondream3.

You can compare how models reason about images 🧠

Bonus: thanks to @ariG23498 , you now get auto-suggested prompts to explore faster.

Let’s gooo

sergiopaniego/vlm_object_understanding
sergiopaniegoΒ 
posted an update 23 days ago
view post
Post
875
New drop! πŸ’₯ The VLM Object Understanding Comparison Space now runs with Qwen3-VL-4B and moondream3.



You can compare how models reason about images 🧠

Bonus: thanks to @ariG23498 , you now get auto-suggested prompts to explore faster.

Let’s gooo

sergiopaniego/vlm_object_understanding
sergiopaniegoΒ 
posted an update 25 days ago
view post
Post
2297
@Qwen released their new small and dense VLMs (Qwen3-VL).

They're incredibly capable and one of my all-time favourite VLMs.

πŸ€— We’ve prepared some resources to help you get started.

> Fine-tune Qwen3-VL-4B with SFT or GRPO (free Colab notebooks):
> SFT: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_qwen_vl.ipynb
> GRPO: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_qwen3_vl.ipynb

> Compare object detection vs. Moondream3:
sergiopaniego/vlm_object_understanding

> Fine-tune from the CLI using TRL:
https://github.com/kashif/Qwen3-VL/blob/trl-sft/qwen-vl-finetune/README.md#trl-based-training-single-gpu
sergiopaniegoΒ 
posted an update 30 days ago
view post
Post
1463
Super nice intro to fine-tuning with TRL, just dropped by @google (runs free on Colab)!

They use SFT + QLoRA to fine-tune the tiny Gemma 3 270M model for emoji generation

Here’s what the fine-tuned model generates for the prompt: β€œI'm learning to tweet” β†’ πŸ¦πŸ—£πŸ’»

Colab: https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/Demos/Emoji-Gemma-on-Web/resources/Fine_tune_Gemma_3_270M_for_emoji_generation.ipynb
Try it out: google/emoji-gemma
Learn more: https://developers.googleblog.com/en/own-your-ai-fine-tune-gemma-3-270m-for-on-device/
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
2413
Online training methods (e.g., GRPO) require real-time generation, a compute- and memory-heavy bottleneck.

TRL has built-in vLLM support and in this new recipe, we show how to leverage it for efficient online training. Run on Colab ⚑, scale to multi-GPU/multi-node!

πŸ§‘β€πŸ³ recipe: https://huggingface.co/learn/cookbook/grpo_vllm_online_training
  • 1 reply
Β·
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
2895
A few days ago, Thinking Machines Lab released β€œLoRA Without Regret”, showing that LoRA can match full fine-tuning performance when configured right.

Naturally, we decided to reproduce the results with TRL and release a guide!

https://huggingface.co/docs/trl/main/en/lora_without_regret
sergiopaniegoΒ 
posted an update about 1 month ago
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
492
You need to try this tool! 🫑

My colleague @Molbap built an interactive HF Space to explore the modular support of open models in transformers over time

πŸ‘€ You’ll spot things like πŸ¦™ llama defining many models or which ones could be modular next

Try it: Molbap/transformers-modular-refactor
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
484
How fast can you create an endpoint in Hugging Face Inference Endpoints with a new model + vLLM to deploy a state-of-the-art OCR model?

Let’s break it down step by step.

1️⃣ Create your endpoint
Go to Hugging Face Endpoints β†’ + NEW
Select Deploy from Hub β†’ rednote-hilab/dots.ocr β†’ Configure πŸ› οΈ

2️⃣ Configure hardware & container
Pick hardware: AWS/GPU/L4 ⚑
Set container: vLLM πŸ‡
Click Create βœ…

3️⃣ Update endpoint settings
Container: Container URI: vllm/vllm-openai:nightly β†’ Update
Advanced: add flag --trust-remote-code β†’ Update ⚠️

4️⃣ Run inference
Download the script πŸ“: ariG23498/useful-scripts
Set your HF_TOKEN and update base_url in the script.
Run it. βœ…

Your OCR model is now live via HF Inference Endpoints!
sergiopaniegoΒ 
posted an update about 2 months ago
view post
Post
3467
πŸ’₯ Tons of new material just landed in the smol-course! πŸ§‘β€πŸ’»

> evaluation
> alignment
> VLMs
> quizzes
> assignments!
> certificates!πŸ‘©β€πŸŽ“

go learn! πŸ‘‰ https://huggingface.co/learn/smol-course/unit0/1
  • 1 reply
Β·
merveΒ 
posted an update about 2 months ago
view post
Post
6625
large AI labs open-sourced a ton of models last week πŸ”₯
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 🀝
> IBM released a new Docling model with 258M params based on Granite (A2.0) πŸ“ ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana 🍌 (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset πŸ’» OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash πŸ’­ meituan-longcat/LongCat-Flash-Thinking
  • 2 replies
Β·
sergiopaniegoΒ 
posted an update about 2 months ago
view post
Post
1400
This summer TRL leveled up for multimodal alignment 🌞

βœ… New VLM alignment methods (MPO, GRPO, GSPO)
βœ… Extended RLOO & Online DPO for VLMs
βœ… Native SFT support
βœ… Ready-to-use training scripts

πŸ”— https://huggingface.co/blog/trl-vlm-alignment
sergiopaniegoΒ 
posted an update about 2 months ago
merveΒ 
posted an update about 2 months ago
view post
Post
3276
IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face πŸ”₯

> not only a document converter but also can do document question answering, understand multiple languages 🀯
> best part: released with Apache 2.0 license πŸ‘ use it with your commercial projects!
> it supports transformers, vLLM and MLX from the get-go! πŸ€—
> built on SigLIP2 & granite-165M

model: ibm-granite/granite-docling-258M
demo: ibm-granite/granite-docling-258m-demo πŸ’—