RuntimeError: expected scalar type Float but found BFloat16 during activation capture
#153
by
ojas03
- opened
I’m encountering a dtype mismatch error when extracting activations from a model during generation using model.generate(). The error occurs when collecting layer activations for later analysis using NumPy.
I get a warning:MXFP4 quantization requires Triton and kernels installed: CUDA requires Triton >= 3.4.0, XPU requires Triton >= 3.5.0, we will default to dequantizing the model to bf16RuntimeError: expected scalar type Float but found BFloat16
This happens while running the following logic:
for layer_name, act in activations.items():
act_float = act.detach().cpu().to(torch.float32).numpy()
layer_sums[layer_name].append(act_float)
To reproduce the issue, here’s a minimal snippet:
messages = [{"role": "user", "content": "Who are you?"}]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Followed by activation capture using:
_ = model.generate(**inputs, max_new_tokens=1)
for layer_name, act in activations.items():
act_float = act.detach().cpu().to(torch.float32).numpy()
layer_sums[layer_name].append(act_float)
PyTorch version: torch==2.9.0
CUDA version: cuda_12.0.r12.0/compiler.32267302_0
Model: gpt-oss-safeguard-20b
GPU: NVIDIA RTX A6000