Uninitialized weights

#1
by andito - opened

Running the first example I get:

In [1]: import torch
   ...: from transformers import AutoModelForCausalLM, AutoTokenizer
   ...: 
   ...: device = "cuda"
   ...: model_path = "ibm-granite/granite-4.0-350M"
   ...: tokenizer = AutoTokenizer.from_pretrained(model_path)
   ...: # drop device_map if running on CPU
   ...: model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
   ...: model.eval()
Some weights of GraniteMoeHybridForCausalLM were not initialized from the model checkpoint at ibm-granite/granite-4.0-350M and are newly initialized: ['model.layers.{0...27}.block_sparse_moe.input_linear.weight', 'model.layers.{0...27}.block_sparse_moe.output_linear.weight', 'model.layers.{0...27}.block_sparse_moe.router.layer.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Out[1]: 
GraniteMoeHybridForCausalLM(
  (model): GraniteMoeHybridModel(
    (embed_tokens): Embedding(100352, 1024, padding_idx=100256)
    (layers): ModuleList(
      (0-27): 28 x GraniteMoeHybridDecoderLayer(
        (block_sparse_moe): GraniteMoeHybridMoE(
          (activation): SiLUActivation()
          (input_linear): GraniteMoeHybridParallelExperts()
          (output_linear): GraniteMoeHybridParallelExperts()
          (router): GraniteMoeHybridTopKGating(
            (layer): Linear(in_features=1024, out_features=0, bias=False)
          )
        )
        (input_layernorm): GraniteMoeHybridRMSNorm((1024,), eps=1e-05)
        (post_attention_layernorm): GraniteMoeHybridRMSNorm((1024,), eps=1e-05)
        (shared_mlp): GraniteMoeHybridMLP(
          (activation): SiLUActivation()
          (input_linear): Linear(in_features=1024, out_features=4096, bias=False)
          (output_linear): Linear(in_features=2048, out_features=1024, bias=False)
        )
        (self_attn): GraniteMoeHybridAttention(
          (q_proj): Linear(in_features=1024, out_features=1024, bias=False)
          (k_proj): Linear(in_features=1024, out_features=256, bias=False)
          (v_proj): Linear(in_features=1024, out_features=256, bias=False)
          (o_proj): Linear(in_features=1024, out_features=1024, bias=False)
        )
      )
    )
    (norm): GraniteMoeHybridRMSNorm((1024,), eps=1e-05)
    (rotary_emb): GraniteMoeHybridRotaryEmbedding()
  )
  (lm_head): Linear(in_features=1024, out_features=100352, bias=False)
)

In [2]: chat = [
   ...:     { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
   ...: ]
   ...: chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
   ...: # tokenize the text
   ...: input_tokens = tokenizer(chat, return_tensors="pt").to(device)
   ...: # generate output tokens
   ...: output = model.generate(**input_tokens,
   ...:                         max_new_tokens=100)
   ...: # decode output tokens into text
   ...: output = tokenizer.batch_decode(output)
   ...: # print output
   ...: print(output[0])
<|start_of_role|>system<|end_of_role|>You are a helpful assistant. Please ensure responses are professional, accurate, and safe.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>Please list one IBM Research laboratory located in the United States. You should only output its name and location.<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>Assistant. I am a list the information. I am given a list of IBM Watson Watson. I am IBM Watson. I am IBM Watson. I am IBM Watson. I am IBM's Watson. I am IBM's Watson. I am IBM's Watson. I am IBM's Watson. I am IBM's AI assistant. I am IBM's AI assistant. I am IBM's AI assistant. I am IBM's AI assistant. I am IBM's AI assistant. I am IBM AI assistant. I

Which is wrong.
I'm using the latest transformers:

>>> transformers.__version__
'5.0.0.dev0'

Hi there Andito.

I tried this on my side with Device set to CPU and running Transformers version 4.57.1 which is the latest release from HF and it seems to work fine.

<|start_of_role|>system<|end_of_role|>You are a helpful assistant. Please ensure responses are professional, accurate, and safe.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>Please list one IBM Research laboratory located in the United States. You should only output its name and location.<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>IBM Research Laboratory: Cambridge Research Laboratory<|end_of_text|>

Perhaps your findings are based on the Dev release you are using. Please could you see if this error is found by you when running one of the supported releases please.

IBM Granite org

@andito It looks like some changes were made in preparation for v5 that broke this model for 5.0.0.dev0. We're raising with the transformers team and will get it fixed asap. Thanks for identifying this!

Sign up or log in to comment