Kernels Tests

community

AI & ML interests

None defined yet.

Recent Activity

danieldk 
posted an update about 1 month ago
danieldk 
published a model 2 months ago
sayakpaul 
posted an update 4 months ago
view post
Post
1772
Fast LoRA inference for Flux with Diffusers and PEFT 🚨

There are great materials that demonstrate how to optimize inference for popular image generation models, such as Flux. However, very few cover how to serve LoRAs fast, despite LoRAs being an inseparable part of their adoption.

In our latest post, @BenjaminB and I show different techniques to optimize LoRA inference for the Flux family of models for image generation. Our recipe includes the use of:

1. torch.compile
2. Flash Attention 3 (when compatible)
3. Dynamic FP8 weight quantization (when compatible)
4. Hotswapping for avoiding recompilation during swapping new LoRAs 🤯

We have tested our recipe with Flux.1-Dev on both H100 and RTX 4090. We achieve at least a *2x speedup* in either of the GPUs. We believe our recipe is grounded in the reality of how LoRA-based use cases are generally served. So, we hope this will be beneficial to the community 🤗

Even though our recipe was tested primarily with NVIDIA GPUs, it should also work with AMD GPUs.

Learn the details and the full code here:
https://huggingface.co/blog/lora-fast
danieldk 
published a model 4 months ago
danieldk 
posted an update 4 months ago
view post
Post
2052
kernels 0.8.0 is out: https://github.com/huggingface/kernels/releases/tag/v0.8.0

This release refines kernel selection in the kernelize function:

• You can now register kernels for certain CUDA capability ranges.
• Rather than doing exact mating of modes, fall back to other compatible modes. If you are kernelizing for inference, but you only registered a training + torch.compile kernel, it will use that kernel since it is compatible with inference as well.
  • 1 reply
·
danieldk 
posted an update 4 months ago
danieldk 
posted an update 4 months ago
view post
Post
377
Kernels 0.7.0 is out: https://github.com/huggingface/kernels/releases/tag/v0.7.0 🚀

This release makes it possible to register multiple kernels for a layer. Do you have a super-fast kernel for inference and another kernel for training? Register them both and kernelize will pick the kernel depending on whether you are going to do training or inference.
danieldk 
posted an update 6 months ago
view post
Post
1903
We have been working on a project called kernels. kernels makes it possible to load compute kernels directly from the Hub! 🚀

We plan to give kernels a more proper introduction soon. But for those who have been following along, we are happy to announce a new release:

- New layer API with torch.compile support.
- Experimental support for loading Apple Silicon Metal 🤘 Kernels.
- Generate wheels from Hub kernels for legacy deployments.

Full release notes here: https://github.com/huggingface/kernels/releases/tag/v0.6.0
  • 2 replies
·