how to run Qwen3-Coder-480B-A35B-Instruct-FP8 using vllm?

#9
by Shoham39 - opened

i am using vllm v0.11.0(tried also with v0.10.2) and i am trying to deploy the Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 but on 8 gpus spread over 4 servers(every server 2xh100(80gb)
using ray.

can someone share the recipe for deploying that model using vllm. cant make it work

do i need to configure quantization fp8 or i don't need because the model wights are already fp8?

another question is how many ram i need in each server?(520gb ram will be enough?)

Sign up or log in to comment