More Knowledge, But Hard To Extract
Just letting you (Alibaba) know that while this updated version of Qwen3 has notably more broad knowledge than the last one, said additional knowledge isn't available in most use cases, such as long-form outputs like stories and cast lists.
For example, when asked about the main cast of Corner Gas, the most watched Canadian TV show all years running, it makes a flood of hallucinations. But When broken down in simple Q&A chunks it does much better. This is the rule and not an isolated example.
Prompt:
What are the 6 main characters, and the actors who portrayed them, on the TV show Corner Gas? Don't add details, just list them. And what year did the show first air?
Response:
Brent Leroy – Brent Butt
Hank Yarbo – Paul Gross
Wanda Dollard – Emily Hampshire
Davis Quinton – Fred Ewanuick
Lacey Burrows – Tara Spencer-Nairn
Oscar Leroy – Eric Peterson
The show first aired in 2004.
But when broken down into pieces...
Correct: Who portrayed Hank Yarbo in the Canadian TV show Corner Gas? Fred Ewanuick
Correct: Who portrayed Wanda Dollard in the Canadian TV show Corner Gas? Nancy Robertson
Wrong, Then Correct: Who portrayed Lacey Burrows in the Canadian TV show Corner Gas? Tara Spencer-Nairn
But when I responded with "Tara portrayed Karen Pelly, a cop. Who portrayed Lacey Burrows?" it correctly said Gabrielle Miller.
Wrong Twice: Who portrayed Davis Quinton in the Canadian TV show Corner Gas? "There is no character named Davis Quinton in Corner Gas.". He was a main character (town's cop) that Qwen3 listed above.
And when I attempted a correction it still guessed wrong (Kerry Sandomirsky vs Lorne Cardinal)
Anyways, the added knowledge in this updated version of Qwen3 is weakly held and only retrievable in very short and direct Q&A form. Perhaps you know how to rectify this.
Also, despite the increase in broad knowledge this Qwen3 2507 version doesn't have the broad knowledge needed to achieve a SimpleQA score of 20, let alone the reported 54. This isn't just about blindly training on Q&A pairs from sources like wiki pages, because if this was the case it would have done much better on the far simpler (less esoteric) questions I tried from the same domains covered by the SimpleQA. It's clear that the actual SimpleQA questions made their way into the training data.