Doesn't Work
#12
by
qpqpqpqpqpqp
- opened
Hi @qpqpqpqpqpqp , thanks for sharing your experience. This kind of hallucination is often the result of incorrect image preprocessing resulting in garbage image tokens in the LLM. an you share some details about what you tried?
- What inference engine you ran (pure
transformers,llama.cpp,ollama,vLLM, etc) - How you invoked the model (the screenshot looks like some app. If it's open source, can you share it?)
- (if possible) The exact message sequence you used so we can try to repro
@gabegoodhart the backend is KoboldCPP, the frontend is SillyTavern. I tried various models, your instruct models with 2b params work, the vision ones don't
Great, those details help a lot. I know that these models have very specific requirements about how the image gets tiled and sliced, so my money is on those not being implemented or configured correctly in KoboldCPP. I'll dig a little.

