Model Overview

  • Model Architecture: Qwen3-30B-A3B-Thinking-2507
    • Input: Text
    • Output: Text
  • Supported Hardware Microarchitecture: AMD MI350/MI355
  • ROCm: 7.0
  • Operating System(s): Linux
  • Inference Engine: vLLM
  • Model Optimizer: AMD-Quark
    • Weight quantization: Perchannel, FP8E4M3, Static
    • Activation quantization: Pertoken, FP8E4M3, Dynamic
  • Calibration Dataset: Pile

This model was built with Qwen3-30B-A3B-Thinking-2507 model by applying AMD-Quark for ptpc quantization.

Model Quantization

The model was quantized from Qwen/Qwen3-30B-A3B-Thinking-2507 using AMD-Quark. The weights are quantized to FP8 and activations are quantized to FP8.

Quantization scripts:

cd Quark/examples/torch/language_modeling/llm_ptq/

python3 internal_scripts/quantize_quark.py --model_dir Qwen/Qwen3-30B-A3B-Thinking-2507 \
                          --quant_scheme w_fp8_per_channel_static_a_fp8_per_token_dynamic \
                          --exclude_layers "*lm_head" "*mlp.gate" \
                          --num_calib_data 512 \
                          --output_dir amd/Qwen3-30B-A3B-Thinking-2507-ptpc \
                          --model_export hf_format \

Accuracy

Benchmark Qwen3-VL-235B-A22B-Instruct Qwen3-VL-235B-A22B-Instruct-ptpc(this model)
GSM8K 0.755 0.720

Reproduction

The result of GSM8K was obtained using vLLM.

GSM8K

lm_eval --model vllm \
    --model_args pretrained=/model_path/Qwen/Qwen3-30B-A3B-Thinking-2507-ptpc,add_bos_token=true,tensor_parallel_size=2 \
    --tasks gsm8k \
    --num_fewshot 5 \
    --batch_size auto \
    --limit 200

Deployment

Use with vLLM

This model can be deployed efficiently using the vLLM backend.

Evaluation

The evaluation results and reproduction script are being prepared.

License

Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.

Downloads last month
61
Safetensors
Model size
31B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for amd/Qwen3-30B-A3B-Thinking-2507-ptpc

Quantized
(70)
this model