hybrid-linear-sparse-attention
Collection
8 items
โข
Updated
Gated DeltaNet + learnable token eviction (0.4B params, 10B tokens)
Various; available in paper
# Requires hybrid-linear-sparse-attention (https://github.com/idiap/hybrid-linear-sparse-attention); please ensure that fla from our modified implementation can be imported.
import fla
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(path_to_model).cuda()
tokenizer = AutoTokenizer.from_pretrained(path_to_model).cuda()
input_ids = tokenizer("All human beings are", return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_length=15)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Gated DeltaNet is released under MIT License
If you find our work useful, please cite the following publication:
@misc{he_alleviating_2025,
title = {Alleviating {Forgetfulness} of {Linear} {Attention} by {Hybrid} {Sparse} {Attention} and {Contextualized} {Learnable} {Token} {Eviction}},
url = {http://arxiv.org/abs/2510.20787},
doi = {10.48550/arXiv.2510.20787},
publisher = {arXiv},
author = {He, Mutian and Garner, Philip N.},
month = oct,
year = {2025},
note = {arXiv:2510.20787 [cs]},
}