not compatible to newest LM-Studio eg llama.cpp 1.52
#20
by
kalle07
- opened
not compatible with newest LM-Studio and llama.cpp 1.52
downloaded today:
bartowski/swiss-ai_Apertus-8B-Instruct-2509-GGUF
bit shame , why you have to create a new kind of architecture? "apertus" ?
2025-10-05 11:23:17 [DEBUG]
[LM Studio] GPU Configuration:
Strategy: evenly
Priority: []
Disabled GPUs: []
Limit weight offload to dedicated GPU Memory: ON
Offload KV Cache to GPU: ON
2025-10-05 11:23:17 [DEBUG]
[LM Studio] Live GPU memory info (source 'LMS Core'):
GPU 0: NVIDIA GeForce RTX 4060 Ti (Used: 774.70 MB, Total: 17.18 GB, Free: 16.40 GB)
2025-10-05 11:23:17 [DEBUG]
[LM Studio] Model load size estimate with raw num offload layers 'max' and context length '8096':
Model: 8.82 GB
Context: 1.67 GB
Total: 10.49 GB
2025-10-05 11:23:17 [DEBUG]
[LM Studio] Resolved GPU config options:
Num Offload Layers: max
Num CPU Expert Layers: 0
Main GPU: 0
Tensor Split: [0]
Disabled GPUs: []
2025-10-05 11:23:17 [DEBUG]
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
2025-10-05 11:23:17 [DEBUG]
CUDA : ARCHS = 750,800,890,900,1000,1200 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
2025-10-05 11:23:17 [DEBUG]
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4060 Ti) (0000:01:00.0) - 15225 MiB free
2025-10-05 11:23:17 [DEBUG]
llama_model_loader: loaded meta data with 48 key-value pairs and 324 tensors from F:\...\swiss-ai_Apertus-8B-Instruct-2509-Q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = apertus
2025-10-05 11:23:17 [DEBUG]
llama_model_loader: - kv 1: xielu.alpha_n arr[f32,32] = [40.750000, 31.625000, 22.875000, 16....
llama_model_loader: - kv 2: xielu.alpha_p arr[f32,32] = [166.000000, 174.000000, 128.000000, ...
llama_model_loader: - kv 3: xielu.beta arr[f32,32] = [0.500000, 0.500000, 0.500000, 0.5000...
llama_model_loader: - kv 4: xielu.eps arr[f32,32] = [-0.000001, -0.000001, -0.000001, -0....
llama_model_loader: - kv 5: general.type str = model
llama_model_loader: - kv 6: general.name str = Apertus 8B Instruct 2509
llama_model_loader: - kv 7: general.version str = 2509
llama_model_loader: - kv 8: general.finetune str = Instruct
llama_model_loader: - kv 9: general.basename str = Apertus
llama_model_loader: - kv 10: general.size_label str = 8B
llama_model_loader: - kv 11: general.license str = apache-2.0
llama_model_loader: - kv 12: general.base_model.count u32 = 1
llama_model_loader: - kv 13: general.base_model.0.name str = Apertus 8B 2509
llama_model_loader: - kv 14: general.base_model.0.version str = 2509
llama_model_loader: - kv 15: general.base_model.0.organization str = Swiss Ai
llama_model_loader: - kv 16: general.base_model.0.repo_url str = https://huggingface.co/swiss-ai/Apert...
llama_model_loader: - kv 17: general.tags arr[str,5] = ["multilingual", "compliant", "swiss-...
llama_model_loader: - kv 18: apertus.block_count u32 = 32
llama_model_loader: - kv 19: apertus.context_length u32 = 65536
llama_model_loader: - kv 20: apertus.embedding_length u32 = 4096
llama_model_loader: - kv 21: apertus.feed_forward_length u32 = 21504
llama_model_loader: - kv 22: apertus.attention.head_count u32 = 32
llama_model_loader: - kv 23: apertus.attention.head_count_kv u32 = 8
llama_model_loader: - kv 24: apertus.rope.freq_base f32 = 12000000.000000
llama_model_loader: - kv 25: apertus.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 26: apertus.vocab_size u32 = 131072
llama_model_loader: - kv 27: apertus.rope.dimension_count u32 = 128
llama_model_loader: - kv 28: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 29: tokenizer.ggml.pre str = tekken
2025-10-05 11:23:17 [DEBUG]
llama_model_loader: - kv 30: tokenizer.ggml.tokens arr[str,131072] = ["<unk>", "<s>", "</s>", "<pad>", "[/...
2025-10-05 11:23:17 [DEBUG]
llama_model_loader: - kv 31: tokenizer.ggml.token_type arr[i32,131072] = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
2025-10-05 11:23:17 [DEBUG]
llama_model_loader: - kv 32: tokenizer.ggml.merges arr[str,269443] = ["Ġ Ġ", "Ġ t", "e r", "i n", "Ġ �...
llama_model_loader: - kv 33: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 34: tokenizer.ggml.eos_token_id u32 = 68
llama_model_loader: - kv 35: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 36: tokenizer.ggml.padding_token_id u32 = 3
llama_model_loader: - kv 37: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 38: tokenizer.ggml.add_sep_token bool = false
llama_model_loader: - kv 39: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 40: tokenizer.chat_template str = {%- macro render_typescript_type(para...
llama_model_loader: - kv 41: tokenizer.ggml.add_space_prefix bool = false
llama_model_loader: - kv 42: general.quantization_version u32 = 2
llama_model_loader: - kv 43: general.file_type u32 = 7
llama_model_loader: - kv 44: quantize.imatrix.file str = /models_out/Apertus-8B-Instruct-2509-...
llama_model_loader: - kv 45: quantize.imatrix.dataset str = /training_dir/calibration_datav5.txt
llama_model_loader: - kv 46: quantize.imatrix.entries_count u32 = 192
llama_model_loader: - kv 47: quantize.imatrix.chunks_count u32 = 822
llama_model_loader: - type f32: 130 tensors
llama_model_loader: - type q8_0: 194 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q8_0
print_info: file size = 7.97 GiB (8.50 BPW)
2025-10-05 11:23:17 [DEBUG]
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'apertus'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'F:\...\swiss-ai_Apertus-8B-Instruct-2509-Q8_0.gguf', try reducing --n-gpu-layers if you're running out of VRAM
2025-10-05 11:23:17 [DEBUG]
lmstudio-llama-cpp: failed to load model. Error: error loading model: error loading model architecture: unknown model architecture: 'apertus'
+1. Bartowski's and other GGUF editions work fine for me in LM Studio v0.3.30. See screenshot and a bit more info in my blog post