VRAM requirement for maximum token length?

#21
by Donhuay - opened

Hi Qwen Team,

First, thank you for all the hard work you’ve put into developing Qwen — it’s an impressive model. In many respects, it feels on par with Gemini and ChatGPT.
I’m writing to ask about your estimates or real-world data for VRAM consumption when running Qwen at 260,000 token length. Specifically:

how much of GPU memory would be required (e.g. 80 GB, 100 GB, more?) in the scenario above? (260k length)

Having these details will greatly help us to render our decisions about deployment.
Thank you, and looking forward to your insights.

Best regards,

Sign up or log in to comment