VRAM requirement for maximum token length?

#21

by Donhuay - opened Sep 15

Sep 15

Hi Qwen Team,

First, thank you for all the hard work you’ve put into developing Qwen — it’s an impressive model. In many respects, it feels on par with Gemini and ChatGPT.
I’m writing to ask about your estimates or real-world data for VRAM consumption when running Qwen at 260,000 token length. Specifically:

how much of GPU memory would be required (e.g. 80 GB, 100 GB, more?) in the scenario above? (260k length)

Having these details will greatly help us to render our decisions about deployment.
Thank you, and looking forward to your insights.

Best regards,

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment