how to run Qwen3-Coder-480B-A35B-Instruct-FP8 using vllm?
#9
by
Shoham39
- opened
i am using vllm v0.11.0(tried also with v0.10.2) and i am trying to deploy the Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 but on 8 gpus spread over 4 servers(every server 2xh100(80gb)
using ray.
can someone share the recipe for deploying that model using vllm. cant make it work
do i need to configure quantization fp8 or i don't need because the model wights are already fp8?
another question is how many ram i need in each server?(520gb ram will be enough?)