Model Overview
- Model Architecture: Qwen3-30B-A3B-Thinking-2507
- Input: Text
- Output: Text
- Supported Hardware Microarchitecture: AMD MI350/MI355
- ROCm: 7.0
- Operating System(s): Linux
- Inference Engine: vLLM
- Model Optimizer: AMD-Quark
- Weight quantization: Perchannel, FP8E4M3, Static
- Activation quantization: Pertoken, FP8E4M3, Dynamic
- Calibration Dataset: Pile
This model was built with Qwen3-30B-A3B-Thinking-2507 model by applying AMD-Quark for ptpc quantization.
Model Quantization
The model was quantized from Qwen/Qwen3-30B-A3B-Thinking-2507 using AMD-Quark. The weights are quantized to FP8 and activations are quantized to FP8.
Quantization scripts:
cd Quark/examples/torch/language_modeling/llm_ptq/
python3 internal_scripts/quantize_quark.py --model_dir Qwen/Qwen3-30B-A3B-Thinking-2507 \
--quant_scheme w_fp8_per_channel_static_a_fp8_per_token_dynamic \
--exclude_layers "*lm_head" "*mlp.gate" \
--num_calib_data 512 \
--output_dir amd/Qwen3-30B-A3B-Thinking-2507-ptpc \
--model_export hf_format \
Accuracy
| Benchmark | Qwen3-VL-235B-A22B-Instruct | Qwen3-VL-235B-A22B-Instruct-ptpc(this model) |
| GSM8K | 0.755 | 0.720 |
Reproduction
The result of GSM8K was obtained using vLLM.
GSM8K
lm_eval --model vllm \
--model_args pretrained=/model_path/Qwen/Qwen3-30B-A3B-Thinking-2507-ptpc,add_bos_token=true,tensor_parallel_size=2 \
--tasks gsm8k \
--num_fewshot 5 \
--batch_size auto \
--limit 200
Deployment
Use with vLLM
This model can be deployed efficiently using the vLLM backend.
Evaluation
The evaluation results and reproduction script are being prepared.
License
Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.
- Downloads last month
- 61
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for amd/Qwen3-30B-A3B-Thinking-2507-ptpc
Base model
Qwen/Qwen3-30B-A3B-Thinking-2507