RuntimeError: expected scalar type Float but found BFloat16 during activation capture

#153
by ojas03 - opened

I’m encountering a dtype mismatch error when extracting activations from a model during generation using model.generate(). The error occurs when collecting layer activations for later analysis using NumPy.
I get a warning:
MXFP4 quantization requires Triton and kernels installed: CUDA requires Triton >= 3.4.0, XPU requires Triton >= 3.5.0, we will default to dequantizing the model to bf16
RuntimeError: expected scalar type Float but found BFloat16

This happens while running the following logic:

for layer_name, act in activations.items():
    act_float = act.detach().cpu().to(torch.float32).numpy()
    layer_sums[layer_name].append(act_float)

To reproduce the issue, here’s a minimal snippet:

messages = [{"role": "user", "content": "Who are you?"}]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Followed by activation capture using:

_ = model.generate(**inputs, max_new_tokens=1)
for layer_name, act in activations.items():
    act_float = act.detach().cpu().to(torch.float32).numpy()
    layer_sums[layer_name].append(act_float)

PyTorch version: torch==2.9.0
CUDA version: cuda_12.0.r12.0/compiler.32267302_0
Model: gpt-oss-safeguard-20b
GPU: NVIDIA RTX A6000

Sign up or log in to comment