wangkanai commited on
Commit
04f367f
·
verified ·
1 Parent(s): 2b73dab

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ qwen3-vl-8b-instruct-abliterated-f16.gguf filter=lfs diff=lfs merge=lfs -text
37
+ qwen3-vl-8b-instruct-abliterated-q4-k-m.gguf filter=lfs diff=lfs merge=lfs -text
38
+ qwen3-vl-8b-instruct-abliterated-q8-0.gguf filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -11,7 +11,7 @@ tags:
11
  - image-text-to-text
12
  ---
13
 
14
- <!-- README Version: v1.1 -->
15
 
16
  # Qwen3-VL-8B-Instruct (Abliterated)
17
 
@@ -40,37 +40,87 @@ This is an **abliterated** (uncensored) version of the Qwen3-VL-8B-Instruct mult
40
 
41
  ```
42
  qwen3-vl-8b-instruct/
43
- ├── qwen3-vl-8b-instruct-abliterated.safetensors # Complete model weights (16.33 GB)
44
- └── README.md # This file
 
 
 
45
  ```
46
 
47
- **Total Repository Size**: 16.33 GB (FP16 precision, single-file format)
48
 
49
  **File Details**:
 
50
  - **qwen3-vl-8b-instruct-abliterated.safetensors**: Complete merged model in safetensors format
51
- - Size: 16.33 GB
52
  - Precision: FP16 (half precision)
53
  - Format: Single-file merged weights (not sharded)
54
- - Contains: Full vision encoder + language model + abliteration modifications
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
  ## Hardware Requirements
57
 
58
- ### Minimum Requirements
 
59
  - **VRAM**: 20 GB (FP16 inference)
60
  - **RAM**: 32 GB system memory
61
  - **Disk Space**: 20 GB free space
62
  - **GPU**: NVIDIA GPU with Compute Capability 7.0+ (V100, RTX 20/30/40 series, A100, etc.)
63
 
64
- ### Recommended Requirements
65
  - **VRAM**: 24 GB+ (RTX 4090, A6000, A100 for longer sequences)
66
  - **RAM**: 64 GB system memory
67
  - **Disk Space**: 30 GB+ (for model caching and optimization)
68
  - **GPU**: NVIDIA RTX 4090, A100, or H100 for optimal performance
69
 
70
- ### Optimization Options
71
- - **INT8 Quantization**: ~10 GB VRAM (with minor quality loss)
72
- - **INT4 Quantization**: ~6 GB VRAM (with moderate quality loss)
73
- - **CPU Inference**: Possible but very slow (not recommended)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
 
75
  ## Usage Examples
76
 
@@ -221,6 +271,148 @@ print("Model layers:", list(weights.keys())[:10]) # First 10 keys
221
  print(f"Total parameters: {sum(w.numel() for w in weights.values()):,}")
222
  ```
223
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
224
  ## Model Specifications
225
 
226
  ### Architecture Details
@@ -518,7 +710,16 @@ processor = Qwen2VLProcessor(image_processor=image_processor, tokenizer=tokenize
518
 
519
  ## Changelog
520
 
521
- **v1.1** (Current)
 
 
 
 
 
 
 
 
 
522
  - Updated README with accurate file information
523
  - Added abliteration details and safety warnings
524
  - Documented single-file merged format
 
11
  - image-text-to-text
12
  ---
13
 
14
+ <!-- README Version: v1.2 -->
15
 
16
  # Qwen3-VL-8B-Instruct (Abliterated)
17
 
 
40
 
41
  ```
42
  qwen3-vl-8b-instruct/
43
+ ├── qwen3-vl-8b-instruct-abliterated.safetensors # Complete model weights (17 GB)
44
+ ├── qwen3-vl-8b-instruct-abliterated-f16.gguf # FP16 GGUF format (16 GB)
45
+ ├── qwen3-vl-8b-instruct-abliterated-q4-k-m.gguf # Q4_K_M quantized (4.7 GB)
46
+ ├── qwen3-vl-8b-instruct-abliterated-q8-0.gguf # Q8_0 quantized (8.2 GB)
47
+ └── README.md # This file
48
  ```
49
 
50
+ **Total Repository Size**: ~46 GB (multiple formats for different use cases)
51
 
52
  **File Details**:
53
+
54
  - **qwen3-vl-8b-instruct-abliterated.safetensors**: Complete merged model in safetensors format
55
+ - Size: 17 GB
56
  - Precision: FP16 (half precision)
57
  - Format: Single-file merged weights (not sharded)
58
+ - Use with: Transformers library, standard PyTorch inference
59
+ - Best for: GPU inference with 20GB+ VRAM
60
+
61
+ - **qwen3-vl-8b-instruct-abliterated-f16.gguf**: FP16 GGUF format
62
+ - Size: 16 GB
63
+ - Precision: FP16 (half precision)
64
+ - Format: GGUF (GPT-Generated Unified Format)
65
+ - Use with: llama.cpp, Ollama, LM Studio
66
+ - Best for: CPU/GPU inference with llama.cpp ecosystem
67
+
68
+ - **qwen3-vl-8b-instruct-abliterated-q4-k-m.gguf**: Q4_K_M quantized GGUF
69
+ - Size: 4.7 GB
70
+ - Precision: 4-bit K-quant (medium quality)
71
+ - Format: GGUF quantized
72
+ - Use with: llama.cpp, Ollama, LM Studio
73
+ - Best for: Lower VRAM systems (8-12 GB), good quality/size balance
74
+
75
+ - **qwen3-vl-8b-instruct-abliterated-q8-0.gguf**: Q8_0 quantized GGUF
76
+ - Size: 8.2 GB
77
+ - Precision: 8-bit quantization
78
+ - Format: GGUF quantized
79
+ - Use with: llama.cpp, Ollama, LM Studio
80
+ - Best for: 12-16 GB VRAM, minimal quality loss from FP16
81
 
82
  ## Hardware Requirements
83
 
84
+ ### SafeTensors Format (FP16)
85
+ **Minimum Requirements**:
86
  - **VRAM**: 20 GB (FP16 inference)
87
  - **RAM**: 32 GB system memory
88
  - **Disk Space**: 20 GB free space
89
  - **GPU**: NVIDIA GPU with Compute Capability 7.0+ (V100, RTX 20/30/40 series, A100, etc.)
90
 
91
+ **Recommended Requirements**:
92
  - **VRAM**: 24 GB+ (RTX 4090, A6000, A100 for longer sequences)
93
  - **RAM**: 64 GB system memory
94
  - **Disk Space**: 30 GB+ (for model caching and optimization)
95
  - **GPU**: NVIDIA RTX 4090, A100, or H100 for optimal performance
96
 
97
+ ### GGUF Formats (Multiple Options)
98
+
99
+ **F16 GGUF** (qwen3-vl-8b-instruct-abliterated-f16.gguf):
100
+ - **VRAM**: 18-20 GB GPU VRAM recommended
101
+ - **RAM**: 32 GB for GPU offloading, 64 GB for CPU inference
102
+ - **Disk Space**: 20 GB
103
+ - **Use Case**: GPU inference with llama.cpp ecosystem
104
+
105
+ **Q8_0 GGUF** (qwen3-vl-8b-instruct-abliterated-q8-0.gguf):
106
+ - **VRAM**: 12-16 GB GPU VRAM
107
+ - **RAM**: 16 GB for GPU offloading, 32 GB for CPU inference
108
+ - **Disk Space**: 10 GB
109
+ - **Quality**: Minimal quality loss from FP16, excellent balance
110
+ - **Use Case**: Mid-range GPUs (RTX 3060 12GB, RTX 4060 Ti 16GB, etc.)
111
+
112
+ **Q4_K_M GGUF** (qwen3-vl-8b-instruct-abliterated-q4-k-m.gguf):
113
+ - **VRAM**: 8-12 GB GPU VRAM
114
+ - **RAM**: 8 GB for GPU offloading, 16 GB for CPU inference
115
+ - **Disk Space**: 6 GB
116
+ - **Quality**: Good quality/size balance, suitable for most tasks
117
+ - **Use Case**: Consumer GPUs (RTX 3060, RTX 4060, etc.)
118
+
119
+ ### CPU-Only Inference (GGUF formats)
120
+ - **RAM**: 32-64 GB system memory
121
+ - **CPU**: Modern CPU with AVX2 support (Intel Core i5/i7/i9, AMD Ryzen)
122
+ - **Performance**: Much slower than GPU, but functional
123
+ - **Recommended**: Q4_K_M format for best performance/quality balance
124
 
125
  ## Usage Examples
126
 
 
271
  print(f"Total parameters: {sum(w.numel() for w in weights.values()):,}")
272
  ```
273
 
274
+ ## GGUF Format Usage
275
+
276
+ The GGUF formats are designed for use with llama.cpp, Ollama, LM Studio, and other GGUF-compatible inference engines. These formats are optimized for flexible deployment across CPU and GPU systems.
277
+
278
+ ### Using with llama.cpp
279
+
280
+ **Installation**:
281
+ ```bash
282
+ git clone https://github.com/ggerganov/llama.cpp
283
+ cd llama.cpp
284
+ make
285
+ ```
286
+
287
+ **Basic Inference** (Q4_K_M for best balance):
288
+ ```bash
289
+ ./main -m "E:\\huggingface\\qwen3-vl-8b-instruct\\qwen3-vl-8b-instruct-abliterated-q4-k-m.gguf" \
290
+ --image "path/to/image.jpg" \
291
+ -p "Describe this image in detail." \
292
+ -n 512 \
293
+ --temp 0.7 \
294
+ --top-p 0.9
295
+ ```
296
+
297
+ **GPU Acceleration** (offload layers to GPU):
298
+ ```bash
299
+ ./main -m "E:\\huggingface\\qwen3-vl-8b-instruct\\qwen3-vl-8b-instruct-abliterated-q8-0.gguf" \
300
+ --image "path/to/image.jpg" \
301
+ -p "What objects are in this image?" \
302
+ -n 512 \
303
+ --n-gpu-layers 35 \
304
+ --temp 0.7
305
+ ```
306
+
307
+ **CPU-Only Inference** (optimized threading):
308
+ ```bash
309
+ ./main -m "E:\\huggingface\\qwen3-vl-8b-instruct\\qwen3-vl-8b-instruct-abliterated-q4-k-m.gguf" \
310
+ --image "path/to/image.jpg" \
311
+ -p "Analyze this chart." \
312
+ -n 512 \
313
+ --threads 8
314
+ ```
315
+
316
+ ### Using with Ollama
317
+
318
+ **Create Modelfile**:
319
+ ```dockerfile
320
+ FROM E:\huggingface\qwen3-vl-8b-instruct\qwen3-vl-8b-instruct-abliterated-q4-k-m.gguf
321
+
322
+ PARAMETER temperature 0.7
323
+ PARAMETER top_p 0.9
324
+ PARAMETER top_k 40
325
+
326
+ SYSTEM """You are an uncensored vision-language AI assistant capable of analyzing images and answering questions without content filtering."""
327
+ ```
328
+
329
+ **Create and run model**:
330
+ ```bash
331
+ ollama create qwen3-vl-abliterated -f ./Modelfile
332
+ ollama run qwen3-vl-abliterated
333
+ ```
334
+
335
+ **Interactive use**:
336
+ ```bash
337
+ >>> What's in this image? /path/to/image.jpg
338
+ ```
339
+
340
+ ### Using with LM Studio
341
+
342
+ 1. Open LM Studio
343
+ 2. Go to "Local Models" → "Import Model"
344
+ 3. Select one of the GGUF files:
345
+ - Use Q4_K_M for best performance on consumer hardware
346
+ - Use Q8_0 for better quality with more VRAM
347
+ - Use F16 for maximum quality
348
+ 4. Load the model and configure:
349
+ - Context Length: 32768
350
+ - GPU Offload: Adjust based on your VRAM
351
+ - Temperature: 0.7 (adjust for your use case)
352
+ 5. Use the image upload feature to analyze images
353
+
354
+ ### Python with llama-cpp-python
355
+
356
+ **Installation**:
357
+ ```bash
358
+ pip install llama-cpp-python
359
+ ```
360
+
361
+ **Basic Usage**:
362
+ ```python
363
+ from llama_cpp import Llama
364
+ from llama_cpp.llama_chat_format import Llava15ChatHandler
365
+
366
+ # Initialize chat handler for vision model
367
+ chat_handler = Llava15ChatHandler(clip_model_path="path/to/clip/model")
368
+
369
+ # Load model
370
+ llm = Llama(
371
+ model_path="E:\\huggingface\\qwen3-vl-8b-instruct\\qwen3-vl-8b-instruct-abliterated-q4-k-m.gguf",
372
+ chat_handler=chat_handler,
373
+ n_ctx=32768,
374
+ n_gpu_layers=35, # Adjust based on VRAM
375
+ verbose=False
376
+ )
377
+
378
+ # Analyze image
379
+ response = llm.create_chat_completion(
380
+ messages=[
381
+ {
382
+ "role": "user",
383
+ "content": [
384
+ {"type": "image_url", "image_url": {"url": "file:///path/to/image.jpg"}},
385
+ {"type": "text", "text": "What is in this image?"}
386
+ ]
387
+ }
388
+ ],
389
+ temperature=0.7,
390
+ max_tokens=512
391
+ )
392
+
393
+ print(response["choices"][0]["message"]["content"])
394
+ ```
395
+
396
+ ### Format Selection Guide
397
+
398
+ **Choose Q4_K_M** if:
399
+ - You have 8-12 GB VRAM
400
+ - You want fast inference with good quality
401
+ - Storage space is a concern
402
+ - Most consumer hardware scenarios
403
+
404
+ **Choose Q8_0** if:
405
+ - You have 12-16 GB VRAM
406
+ - You want minimal quality loss from FP16
407
+ - You can spare the extra storage
408
+ - Professional or high-quality output needs
409
+
410
+ **Choose F16 GGUF** if:
411
+ - You have 20+ GB VRAM
412
+ - You want maximum quality
413
+ - You prefer GGUF ecosystem over PyTorch
414
+ - You need llama.cpp compatibility with full precision
415
+
416
  ## Model Specifications
417
 
418
  ### Architecture Details
 
710
 
711
  ## Changelog
712
 
713
+ **v1.2** (Current - November 2025)
714
+ - Added GGUF format files (F16, Q8_0, Q4_K_M)
715
+ - Comprehensive GGUF usage documentation (llama.cpp, Ollama, LM Studio)
716
+ - Detailed hardware requirements for each format
717
+ - Format selection guide for different use cases
718
+ - Updated total repository size to ~46 GB
719
+ - Added Python llama-cpp-python examples
720
+ - Enhanced deployment flexibility across CPU/GPU systems
721
+
722
+ **v1.1**
723
  - Updated README with accurate file information
724
  - Added abliteration details and safety warnings
725
  - Documented single-file merged format
qwen3-vl-8b-instruct-abliterated-f16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:198b11e5bf72366e17c26fd2ef7acffb7b521e0520b5d888adb3caf7ba1df5ae
3
+ size 16388044928
qwen3-vl-8b-instruct-abliterated-q4-k-m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a12b15fc5631cd42bd67739d8bbb5ac2e18b958c879d16e6bcd3d86879f0117d
3
+ size 5027784832
qwen3-vl-8b-instruct-abliterated-q8-0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:39579e349e291e70a6391eebfb8a93046d5798e018df0a029cc6408c35ecbb80
3
+ size 8709519488