AI & ML interests

None defined yet.

Recent Activity

lunarfluย 
posted an update about 9 hours ago
lunarfluย 
posted an update about 9 hours ago
view post
Post
63
The new King ๐Ÿ‘‘has arrived!

Moonshot AI now the top model on Hugging Face ๐Ÿ”ฅ
moonshotai/Kimi-K2-Thinking
lunarfluย 
posted an update about 9 hours ago
view post
Post
279
๐Ÿ’ธ๐Ÿค‘You donโ€™t need 100 GPUs to train something amazing!

Our Smol Training Playbook teaches you a better path to world-class LLMs, for free!

Check out the #1 trending space on ๐Ÿค— :
HuggingFaceTB/smol-training-playbook
AdinaYย 
posted an update 2 days ago
view post
Post
2420
Kimi K2 Thinking is now live on the hub ๐Ÿ”ฅ

moonshotai/Kimi-K2-Thinking

โœจ 1T MoE for deep reasoning & tool use
โœจ Native INT4 quantization = 2ร— faster inference
โœจ 256K context window
โœจ Modified MIT license
AdinaYย 
posted an update 3 days ago
view post
Post
259
Chinese open source AI in October wasnโ€™t about bigger models, it was about real world impact ๐Ÿ”ฅ

https://huggingface.co/collections/zh-ai-community/october-2025-china-open-source-highlights

โœจ Vision-Language & OCR wave ๐ŸŒŠ
- DeepSeek-OCR : 3B
- PaddleOCR-VL : 0.9B
- Qwen3-VL : 2B / 4B / 8B / 32B /30B-A3B
- Open-Bee: Bee-8B-RL
- http://Z.ai Glyph :10B

OCR is industrializing, the real game now is understanding the (long context) document, not just reading it.

โœจ Text generation: scale or innovation?
- MiniMax-M2: 229B
- Antgroup Ling-1T & Ring-1T
- Moonshot Kimi-Linear : linear-attention challenger
- Kwaipilot KAT-Dev

Efficiency is the key.

โœจ Any-to-Any & World-Model : one step forward to the real world
- BAAI Emu 3.5
- Antgroup Ming-flash-omni
- HunyuanWorld-Mirror: 3D

Aligning with the โ€œworld modelโ€ globally

โœจ Audio & Speech + Video & Visual: released from entertainment labs to delivery platforms
- SoulX-Podcast TTS
- LongCat-Audio-Codec & LongCat-Video by Meituan delivery paltform
- xiabs DreamOmni 2

Looking forward to what's next ๐Ÿš€
sergiopaniegoย 
posted an update 4 days ago
sergiopaniegoย 
posted an update 6 days ago
AdinaYย 
posted an update 8 days ago
view post
Post
358
Kimi Linear๐Ÿš€ Hybrid linear attention model from Moonshot AI

https://huggingface.co/collections/moonshotai/kimi-linear-a3b

โœจ 48B total/ 3B active - MIT license
โœจ Up to 1M context
โœจ 84.3 on RULER (128k) with 3.98ร— speedup
โœจ Hybrid KDA + MLA architecture for peak throughput & quality
sergiopaniegoย 
posted an update 9 days ago
sergiopaniegoย 
posted an update 10 days ago
AdinaYย 
posted an update 12 days ago
view post
Post
1662
Ming-flash-omni Preview ๐Ÿš€ Multimodal foundation model from AntGroup

inclusionAI/Ming-flash-omni-Preview

โœจ Built on Ling-Flash-2.0: 10B total/6B active
โœจ Generative segmentation-as-editing
โœจ SOTA contextual & dialect ASR
โœจ High-fidelity image generation
AdinaYย 
posted an update 12 days ago
view post
Post
1712

Glyph ๐Ÿ”ฅ a framework that scales context length by compressing text into images and processing them with visionโ€“language models, released by Z.ai.

Paper:https://huggingface.co/papers/2510.17800
Model:https://huggingface.co/zai-org/Glyph

โœจ Compresses long sequences visually to bypass token limits
โœจ Reduces computational and memory costs
โœจ Preserves meaning through multimodal encoding
โœจ Built on GLM-4.1V-9B-Base
sergiopaniegoย 
posted an update 16 days ago
view post
Post
2782
Meet OpenEnv ๐Ÿ‘‹, an open ecosystem of environments for intelligent agents. Build, share, and test agents safely and consistently.

Ideal for training with TRL (we include examples๐Ÿค“), deployment, and community collaboration via the HF Hub

Blog: https://huggingface.co/blog/openenv
Hub for Environments: openenv
OpenEnv repo: https://github.com/meta-pytorch/OpenEnv
Try it out using TRL: https://huggingface.co/docs/trl/main/en/openenv
  • 1 reply
ยท
AdinaYย 
posted an update 17 days ago
view post
Post
2593
HunyuanWorld Mirror๐Ÿ”ฅa versatile feed forward model for universal 3D world reconstruction by Tencent

tencent/HunyuanWorld-Mirror

โœจ Any prior in โ†’ 3D world out
โœจ Mix camera, intrinsics, depth as priors
โœจ Predict point clouds, normals, Gaussians & more in one pass
โœจ Unified architecture for all 3D task
anditoย 
posted an update 18 days ago
view post
Post
1651
Finally, our new paper is out! "๐—™๐—ถ๐—ป๐—ฒ๐—ฉ๐—ถ๐˜€๐—ถ๐—ผ๐—ป: ๐—ข๐—ฝ๐—ฒ๐—ป ๐——๐—ฎ๐˜๐—ฎ ๐—œ๐˜€ ๐—”๐—น๐—น ๐—ฌ๐—ผ๐˜‚ ๐—ก๐—ฒ๐—ฒ๐—ฑ"! ๐Ÿฅณ
FineVision: Open Data Is All You Need (2510.17269)

If you've ever trained a VLM, you know this problem: nobody shares their data mixtures. It's a black box, making replicating SOTA work impossible.
We wanted to change that.

FineVision unifies 200 sources into 24 million samples. With 17.3 million images and 9.5 billion answer tokens, it's the largest open resource of its kind.

In the paper, we share how we built it:
๐Ÿ” finding and cleaning data at scale
๐Ÿงน removing excessive duplicates across sources
๐Ÿค— decontaminating against 66 public benchmarks

My favorite part is Figure 6 (in the video!). It's our visual diversity analysis. It shows that FineVision isn't just bigger; it's more balanced and conceptually richer than other open datasets.
NVIDIA's Eagle 2 paper highlighted just how critical this visual diversity is, and our results confirm it: models trained on FineVision consistently outperform those trained on any other open dataset on 11 benchmarks!

๐ŸŽ‰ To celebrate the paper, Iโ€™m also releasing a concatenated and shuffled version of the full dataset! ๐Ÿ‘‰HuggingFaceM4/FineVision_full_shuffled

Itโ€™s ready to stream, so you can start training your own models right away:

from datasets import load_dataset
d = load_dataset("HuggingFaceM4/FineVision_full_shuffled", split="train", streaming=True)
print(next(iter(d)))

A big shoutout to the first authors: Luis Wiedmann and Orr Zohar. They are rockstars!
merveย 
posted an update 19 days ago
view post
Post
5086
deepseek-ai/DeepSeek-OCR is out! ๐Ÿ”ฅ my take โคต๏ธ
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
ยท