how to run Qwen3-Coder-480B-A35B-Instruct-FP8 using vllm?

by Shoham39 - opened 19 days ago

19 days ago

i am using vllm v0.11.0(tried also with v0.10.2) and i am trying to deploy the Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 but on 8 gpus spread over 4 servers(every server 2xh100(80gb)
using ray.

can someone share the recipe for deploying that model using vllm. cant make it work

do i need to configure quantization fp8 or i don't need because the model wights are already fp8?

another question is how many ram i need in each server?(520gb ram will be enough?)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment