Terrible trivia knowledge for the size

#15

by ChuckMcSneed - opened 14 days ago

Discussion

ChuckMcSneed

14 days ago

Knows very little despite being 1T, very disappointing.

Shinku

14 days ago

•

edited 14 days ago

It has worse world knowledge than Llama 3 70B. I was expecting it to be at least comparable to Deepseek in this regard. It seems some labs believe that real world knowledge isn't that important. Talking with these models feels like talking to someone who has spent their whole life under a rock reading science books. That is what happens when you train with 90% synthetic data.

jebbam

14 days ago

Do you have any examples of prompts that produced bad results?

ChuckMcSneed

14 days ago

@jebbam Very simple ones like "Which song is rapper viper best known for?" and other similar questions, llama knows it, ling does not. Try asking it about anything slightly away from mainstream, you'll notice how limited it is.

Shinku

14 days ago

Do you have any examples of prompts that produced bad results?

Just ask it anything specify about any TV show, anime or game, it will hallucinate the answer most of the time.
For example, this is a random trivia from Pokémon Crystal, and it's not even an obscure thing, it's almost impossible to skip if you play the game: https://www.serebii.net/crystal/dratini.shtml

Prompt: How do you obtain a Dratini with ExtremeSpeed in Pokemon Crystal?

Ling-1T (Hallucinate the entire answer):

GLM (Answered correctly):

GLM, GLM AIR, Kimi-k2, Deepseek, Ernie-300B answered this correctly. Ling/Ring 1T and Qwen (any size) are the only models that are not able to answer this specify questions of all the models I tested right now, you can try yourself.

jebbam

14 days ago

Yes, I see that Ling (and Ring) didn't generate a correct answer, but Llama 3.3 instruct gave "You'll Cowards Don't Even Smoke Crack", which AFAICT is the correct answer. Ling does appear to do well on scientific questions. So if you want an LLM that knows pop culture, Ling doesn't appear to be a good model. If you want technical answers, it is a good option.

Shinku's "90% synthetic data" claim seems dubious though.

Shinku

14 days ago

@jebbam I wonder if it’s really that good at STEM, can you give me an example where Ling or Ring perform better than other much smaller models? I'm not talking about Llama which is old, but Qwen 3 230B and GLM-4.6, which are ~3x smaller.

jebbam

14 days ago

I don't know offhand if Ling/Ring outperform Qwen or GLM in STEM, both of which are very good models IMHO. To be clear, I'm not affiliated with any of these companies.

What is your basis for the claim that it is trained 90% synthetic data?

Shinku

14 days ago

I exaggerated that, but it's probably not that far from the truth: "Pre-training used over 20T high-quality tokens, with > 40% reasoning-dense data in later stages."

That is probably by design, they're sacrificing the model's recall ability in exchange for making it "smarter".

RichardBian

inclusionAI org 11 days ago

•

edited 11 days ago

Do you have any examples of prompts that produced bad results?

Just ask it anything specify about any TV show, anime or game, it will hallucinate the answer most of the time.
For example, this is a random trivia from Pokémon Crystal, and it's not even an obscure thing, it's almost impossible to skip if you play the game: https://www.serebii.net/crystal/dratini.shtml

Prompt: How do you obtain a Dratini with ExtremeSpeed in Pokemon Crystal?

Ling-1T (Hallucinate the entire answer):

GLM (Answered correctly):

GLM, GLM AIR, Kimi-k2, Deepseek, Ernie-300B answered this correctly. Ling/Ring 1T and Qwen (any size) are the only models that are not able to answer this specify questions of all the models I tested right now, you can try yourself.

Sorry I couldn't get this case repro-ed. From the interface it looks like you were using ZenMux for this. I've noted the case and we'll do some further evaluation.

RichardBian changed discussion status to closed 11 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment