Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
AdamF92 's Collections
Reactive Transformer PoC Supervised Models by Reactive AI
Sparse Query Attention (SQA) Research by Reactive AI
Interaction SFT Datasets for Reactive Transformer by RxAI

Sparse Query Attention (SQA) Research by Reactive AI

updated Oct 9

Experimental models with Sparse Query Attention layers. Reducing training time/cost by ~3-10% compared to GQA & MQA, with the same level performance

Upvote
-

  • Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction

    Paper • 2510.01817 • Published Oct 2 • 15

  • ReactiveAI/sSQAT-mm

    Text Generation • 8.62M • Updated Oct 3

  • ReactiveAI/SQAT-mm

    Text Generation • 8.57M • Updated Oct 3

  • ReactiveAI/xSQAT-mm

    Text Generation • 8.52M • Updated Oct 3

  • ReactiveAI/GQA-Ref-Micro

    Text Generation • 8.67M • Updated Oct 3

  • ReactiveAI/MQA-Ref-Micro

    Text Generation • 8.64M • Updated Oct 3

  • ReactiveAI/SQAT-m

    Text Generation • 10.7M • Updated Oct 3

  • ReactiveAI/xSQAT-m

    Text Generation • 10.4M • Updated Oct 3

  • ReactiveAI/sSQAT-m

    Text Generation • 10.9M • Updated Oct 3

  • ReactiveAI/xSMQAT-m

    Text Generation • 10.2M • Updated Oct 3
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs