elichen-skymizer's picture
Update README.md
6e96451 verified
metadata
license: mit

These models are converted from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B with llama.cpp. These models contain biases in both q_proj and k_proj layers.

The bf16, f16 models are converted with llama.cpp in branch b4514, because the latest main branch of llama.cpp failed to convert the hf model into GGUF format (2025/7/28).

Both of them can be evaluated with latest llama.cpp via ./build/bin/llama-perplexity. However, when we evaluate the bf16, f16 model with lighteval in vllm, it seems fail to load the model correctly. (2025/8/5)