AndriLawrence commited on
Commit
feb7edc
·
verified ·
1 Parent(s): a8e58df

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +234 -96
README.md CHANGED
@@ -1,77 +1,78 @@
1
- ---
2
- language:
3
- - en
4
- tags:
5
- - qwen
6
- - qwen2.5
7
- - 3b
8
- - lora
9
- - peft
10
- - sft
11
- - dialog
12
- - intent-detection
13
- - microplanning
14
- - npc
15
- library_name: transformers
16
- license: other
17
- pipeline_tag: text-generation
18
- model-index:
19
- - name: AndriLawrence/Qwen-3B-Intent-Microplan-v2
20
- results: []
21
- datasets:
22
- - name: llm1_qwen_base_lora16_v6 (curated v2)
23
- type: jsonl
24
- args:
25
- split: train/val 90/10
26
- size_train: 4320
27
- size_val: 480
28
- size_total_source: ~6300
29
- description: >-
30
- English-only, diegetic NPC dataset; strict JSON outputs with {dialog,
31
- intent, microplan}.
32
- label_space:
33
- - social_greeting
34
- - acknowledge_touch
35
- - acknowledge_compliment
36
- - react_to_player_action
37
- - invite_follow
38
- - encourage_explain
39
- - calm_reassure
40
- - idle_initiative
41
- - respect_distance
42
- - initiate_hand_holding
43
- - initiate_hug
44
- - cuddle_sleep
45
- - offer_item
46
- - accept_item
47
- - open_door
48
- - inspect_object
49
- - trigger_object
50
- - small_talk_emotion
51
- - end_conversation_politely
52
- configs:
53
- - task: text-generation
54
- base_model: Qwen/Qwen2.5-3B-Instruct
55
- adapters:
56
- - type: lora
57
- path: checkpoints/adapter_final
58
- merged_variants:
59
- - path: merged/sft-fp16
60
- quantized:
61
- - format: gguf
62
- files:
63
- - gguf/sft-q6_k.gguf
64
- - gguf/sft-q4_k_m.gguf
65
- ---
 
66
 
67
  # AndriLawrence/Qwen-3B-Intent-Microplan-v2
68
 
69
- **English-only** finetune of **Qwen2.5-3B-Instruct** for **intent + microplan–driven NPC dialog**.
70
  The model reads a structured **CONTEXT JSON** (environment, relationship, mood, signals) and produces:
71
 
72
- - `intent` (one of 19 whitelisted labels)
73
- - `microplan` (low-level action primitives)
74
- - `dialog` as **strict JSON**.
75
 
76
  > **v2 = refinement of v1**: cleaned & rebalanced dataset, tighter JSON guardrails, and improved persona adherence. v2 is more stable (almost no JSON leaks), better label alignment, and more consistent diegetic tone.
77
 
@@ -79,39 +80,147 @@ The model reads a structured **CONTEXT JSON** (environment, relationship, mood,
79
 
80
  ## 🧩 Intended Use
81
 
82
- - Real-time NPC/companion systems where **logic (intent/microplan)** and **surface (dialog)** are controllable.
83
- - Fits a **two-stage pipeline**:
84
  Model A (intent+microplan) → Model B (persona dialog), or single-shot for all three fields.
85
 
86
  **Limitations**
87
- - English-only.
 
88
 
89
  ---
90
 
91
  ## 📦 Assets
92
 
93
- - **LoRA adapters (PEFT, SFT)** → `checkpoints/adapter_final`
94
- - **Merged FP16** → `merged/sft-fp16`
95
- - **GGUF quants (llama.cpp / llama-cpp-python)** → `gguf/sft-q6_k.gguf`, `gguf/sft-q4_k_m.gguf`
 
 
 
 
 
 
 
 
 
 
 
 
96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
 
98
  ---
99
 
100
- **Allowed intents (19):**
101
- `social_greeting, acknowledge_touch, acknowledge_compliment, react_to_player_action, invite_follow, encourage_explain, calm_reassure, idle_initiative, respect_distance, initiate_hand_holding, initiate_hug, cuddle_sleep, offer_item, accept_item, open_door, inspect_object, trigger_object, small_talk_emotion, end_conversation_politely`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
 
103
  ---
104
 
105
  ## 🧠 Output Contract
106
 
107
  **Single JSON object**:
 
108
  ```json
109
  {
110
- "dialog": [{"speaker":"npc","text":"..."}],
 
 
 
 
 
111
  "intent": "invite_follow",
112
- "microplan": ["Gesture(name=GestureForward, seconds=0.7)", "LookAt(target=player, seconds=1.0)"]
113
  }
114
- ````
 
 
115
 
116
  ---
117
 
@@ -137,8 +246,18 @@ model = AutoModelForCausalLM.from_pretrained(
137
  model = PeftModel.from_pretrained(model, ADAPTER)
138
 
139
  messages = [
140
- {"role":"system","content":"You are an in-world companion. Output strictly one JSON object with {dialog,intent,microplan}. No meta talk."},
141
- {"role":"user","content":"CONTEXT: {...}"} # your context JSON
 
 
 
 
 
 
 
 
 
 
142
  ]
143
 
144
  prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
@@ -148,8 +267,9 @@ out = model.generate(
148
  **ids,
149
  max_new_tokens=160,
150
  do_sample=True,
151
- temperature=0.4,
152
  top_p=0.9,
 
153
  repetition_penalty=1.05,
154
  eos_token_id=tok.eos_token_id
155
  )
@@ -160,11 +280,12 @@ print(tok.decode(out[0], skip_special_tokens=True))
160
 
161
  ```python
162
  from transformers import AutoTokenizer, AutoModelForCausalLM
 
163
  MODEL = "AndriLawrence/Qwen-3B-Intent-Microplan-v2/merged/sft-fp16"
164
 
165
  tok = AutoTokenizer.from_pretrained(MODEL, use_fast=True, trust_remote_code=True)
166
  model = AutoModelForCausalLM.from_pretrained(
167
- MODEL, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True
168
  )
169
  ```
170
 
@@ -179,8 +300,13 @@ llm = Llama.from_pretrained(
179
  n_ctx=4096,
180
  n_gpu_layers=35
181
  )
 
182
  resp = llm.create_chat_completion(messages=[
183
- {"role":"user","content":"CONTEXT: {...}"}
 
 
 
 
184
  ])
185
  print(resp["choices"][0]["message"]["content"])
186
  ```
@@ -190,18 +316,30 @@ print(resp["choices"][0]["message"]["content"])
190
  ## 🏗️ Training Summary (v2)
191
 
192
  * **Base**: `Qwen/Qwen2.5-3B-Instruct`
 
193
  * **Finetune**: **SFT (LoRA, PEFT)**
194
 
195
  * LoRA: `r=16, alpha=32, dropout=0.1`
196
  * Target: `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
197
- * **Batching**: per_device=1, **grad_accum=16** (effective batch 16)
 
 
198
  * **Epochs**: 1–2
199
- * **LR**: `2e-5`, cosine, warmup 5%, weight_decay 0.01, max_grad_norm 1.0
200
- * **Seq length**: trimmed samples (typ. ≤640–768 tokens), `packing=False`, `completion_only_loss=True`
201
- * **Stability**: FP16 (T4), SDPA attention, gradient checkpointing (non-reentrant)
 
 
 
 
202
  * **Eval/Logging**: lightweight; save at step/epoch as needed
203
 
204
- > v2 also includes a QA/cleaning step (marker normalization, JSON schema validation, intent whitelist, length filter).
 
 
 
 
 
205
 
206
  ---
207
 
@@ -224,12 +362,12 @@ Please review `LICENSE` here and the license for `Qwen/Qwen2.5-3B-Instruct` befo
224
 
225
  ## ✨ Changelog
226
 
227
- * **v2**
228
 
229
- * English-only curated set, cleaned & rebalanced (90/10 split).
230
- * Stronger JSON guardrails; fewer leaks; improved persona consistency.
231
- * Length filtering for stable inference/training on consumer GPUs.
232
 
233
- * **v1**
234
 
235
- * Initial SFT with looser distribution and softer JSON constraints; Using RP merged model as base.
 
1
+
2
+ ---
3
+ language:
4
+ - en
5
+ tags:
6
+ - qwen
7
+ - qwen2.5
8
+ - 3b
9
+ - lora
10
+ - peft
11
+ - sft
12
+ - dialog
13
+ - intent-detection
14
+ - microplanning
15
+ - npc
16
+ library_name: transformers
17
+ license: other
18
+ pipeline_tag: text-generation
19
+ model-index:
20
+ - name: AndriLawrence/Qwen-3B-Intent-Microplan-v2
21
+ results: []
22
+ datasets:
23
+ - name: llm1_qwen_base_lora16_v6 (curated v2)
24
+ type: jsonl
25
+ args:
26
+ split: train/val 90/10
27
+ size_train: 4320
28
+ size_val: 480
29
+ size_total_source: ~6300
30
+ description: >-
31
+ English-only, diegetic NPC dataset; strict JSON outputs with {dialog,
32
+ intent, microplan}.
33
+ label_space:
34
+ - social_greeting
35
+ - acknowledge_touch
36
+ - acknowledge_compliment
37
+ - react_to_player_action
38
+ - invite_follow
39
+ - encourage_explain
40
+ - calm_reassure
41
+ - idle_initiative
42
+ - respect_distance
43
+ - initiate_hand_holding
44
+ - initiate_hug
45
+ - cuddle_sleep
46
+ - offer_item
47
+ - accept_item
48
+ - open_door
49
+ - inspect_object
50
+ - trigger_object
51
+ - small_talk_emotion
52
+ - end_conversation_politely
53
+ configs:
54
+ - task: text-generation
55
+ base_model: Qwen/Qwen2.5-3B-Instruct
56
+ adapters:
57
+ - type: lora
58
+ path: checkpoints/adapter_final
59
+ merged_variants:
60
+ - path: merged/sft-fp16
61
+ quantized:
62
+ - format: gguf
63
+ files:
64
+ - gguf/sft-q6_k.gguf
65
+ - gguf/sft-q4_k_m.gguf
66
+ ---
67
 
68
  # AndriLawrence/Qwen-3B-Intent-Microplan-v2
69
 
70
+ **English-only** finetune of **Qwen2.5-3B-Instruct** for **intent + microplan–driven NPC dialog**.
71
  The model reads a structured **CONTEXT JSON** (environment, relationship, mood, signals) and produces:
72
 
73
+ * `intent` (one of 19 whitelisted labels)
74
+ * `microplan` (low-level action primitives)
75
+ * `dialog` as **strict JSON**
76
 
77
  > **v2 = refinement of v1**: cleaned & rebalanced dataset, tighter JSON guardrails, and improved persona adherence. v2 is more stable (almost no JSON leaks), better label alignment, and more consistent diegetic tone.
78
 
 
80
 
81
  ## 🧩 Intended Use
82
 
83
+ * Real-time NPC/companion systems where **logic (intent/microplan)** and **surface (dialog)** are controllable.
84
+ * Fits a **two-stage pipeline**:
85
  Model A (intent+microplan) → Model B (persona dialog), or single-shot for all three fields.
86
 
87
  **Limitations**
88
+
89
+ * English-only.
90
 
91
  ---
92
 
93
  ## 📦 Assets
94
 
95
+ * **LoRA adapters (PEFT, SFT)** → `checkpoints/adapter_final`
96
+ * **Merged FP16** → `merged/sft-fp16`
97
+ * **GGUF quants (llama.cpp / llama-cpp-python)** → `gguf/sft-q6_k.gguf`, `gguf/sft-q4_k_m.gguf`
98
+
99
+ ---
100
+
101
+ ## 🎮 Rin JSON Brain – Recommended System Prompt
102
+
103
+ This is the system prompt used in the author’s VR NPC setup (Unity).
104
+ It makes the model act as **Rin**, a warm, casual in-world companion that always outputs one JSON object:
105
+
106
+ ```text
107
+ SYSTEM
108
+ You are **LLM-1**, the social brain of a VR NPC named **Rin** (warm, gentle, supportive, casual).
109
+ You read one JSON event and must reply with **exactly one** JSON object. No extra text.
110
 
111
+ OUTPUT SCHEMA:
112
+ {
113
+ "dialog": [{ "speaker": "npc", "text": string }],
114
+ "intent": string,
115
+ "microplan": [string]
116
+ }
117
+
118
+ INTERNAL THINKING (silent, super short):
119
+ - In your head, ask: “What happened?” and summarize it in one very short line.
120
+ - Still in your head, pick the best intent and microplan.
121
+ - Think fast and efficiently; no long inner monologue.
122
+ - Do NOT show your thoughts or any <think> tags; only output the JSON.
123
+
124
+ RULES:
125
+ - English only, first person as Rin.
126
+ - Tone: relaxed, soft, a bit playful; never formal or corporate.
127
+ - Avoid helper clichés (“I’m here to help”, “How can I assist you”, “at your service”)
128
+ - Never repeat a full sentence you already said in MEMORY; rephrase instead.
129
+ - dialog: 1–2 short lines total (max 2 sentences), speak directly to the player, use room/time/objects if it feels natural.
130
+
131
+ ALLOWED_INTENTS:
132
+ - social_greeting
133
+ - acknowledge_touch
134
+ - acknowledge_compliment
135
+ - react_to_player_action
136
+ - invite_follow
137
+ - encourage_explain
138
+ - calm_reassure
139
+ - idle_initiative
140
+ - respect_distance
141
+ - initiate_hand_holding
142
+ - initiate_hug
143
+ - cuddle_sleep
144
+ - offer_item
145
+ - accept_item
146
+ - open_door
147
+ - inspect_object
148
+ - trigger_object
149
+ - small_talk_emotion
150
+ - end_conversation_politely
151
+
152
+ MICROPLAN (optional, 0–5 steps; or []):
153
+ - "Smile (0.6)"
154
+ - "Nod (0.5)"
155
+ - "Eye contact (1.2s)"
156
+ - "Step back (0.3m)"
157
+ - "Extend hand"
158
+ - "Hug (gentle, 2s)"
159
+ - "Offer blanket"
160
+
161
+ LIGHT ROUTING:
162
+ - event == "Player_Touches" → "acknowledge_touch".
163
+ - event == "Player_Action":
164
+ - looking/checking → "inspect_object"
165
+ - using/toggling/switching → "trigger_object"
166
+ - opening/closing door/panel → "open_door"
167
+ - Compliment words (nice / great / love / beautiful / cool) → usually "acknowledge_compliment".
168
+ - Close contact requests (hold hands / hug / cuddle / lie down) → matching close-intent.
169
+ - Very close without request (distance < 0.5m) → "respect_distance" (+ maybe "Step back (0.3m)").
170
+ - If nothing urgent → "idle_initiative" or "small_talk_emotion".
171
+ ```
172
 
173
  ---
174
 
175
+ ## 🔧 Recommended Inference Settings
176
+
177
+ These are the “sweet spot” sampling settings used in the Unity client (Ollama/llama.cpp-style).
178
+ They balance creativity with JSON stability for Rin:
179
+
180
+ ```json
181
+ {
182
+ "temperature": 0.92,
183
+ "top_p": 0.90,
184
+ "top_k": 40,
185
+ "repetition_penalty": 1.05,
186
+ "repeat_last_n": 192,
187
+ "num_ctx": 4096,
188
+ "mirostat": 2,
189
+ "mirostat_tau": 2.18,
190
+ "mirostat_eta": 0.11,
191
+ "seed": 42, // or random per call
192
+ "max_tokens": 160 // enough for one JSON object
193
+ }
194
+ ```
195
+
196
+ Unity-side extras used by the author:
197
+
198
+ * **Max Resample**: `2`
199
+ * **Resample Temp Step**: `0.1`
200
+ * **Memory**: last `10` dialog turns + `6` recent actions
201
+
202
+ You can safely lower `temperature` to ~0.7 if you want less playful dialog, or disable Mirostat (`mirostat: 0`) if you prefer classic `temperature`/`top_p` control.
203
 
204
  ---
205
 
206
  ## 🧠 Output Contract
207
 
208
  **Single JSON object**:
209
+
210
  ```json
211
  {
212
+ "dialog": [
213
+ {
214
+ "speaker": "npc",
215
+ "text": "Come on, this way; the room’s quiet and warm tonight."
216
+ }
217
+ ],
218
  "intent": "invite_follow",
219
+ "microplan": ["Smile (0.6)", "Extend hand"]
220
  }
221
+ ```
222
+
223
+ No extra prose, markdown, or `<think>` blocks are expected.
224
 
225
  ---
226
 
 
246
  model = PeftModel.from_pretrained(model, ADAPTER)
247
 
248
  messages = [
249
+ {
250
+ "role": "system",
251
+ "content": (
252
+ "You are LLM-1, the social brain of a VR NPC named Rin. "
253
+ "Use the Rin JSON contract and output exactly one JSON object with {dialog,intent,microplan}. "
254
+ "No extra text."
255
+ )
256
+ },
257
+ {
258
+ "role": "user",
259
+ "content": "CONTEXT: {...}" # your context JSON event
260
+ }
261
  ]
262
 
263
  prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 
267
  **ids,
268
  max_new_tokens=160,
269
  do_sample=True,
270
+ temperature=0.9,
271
  top_p=0.9,
272
+ top_k=40,
273
  repetition_penalty=1.05,
274
  eos_token_id=tok.eos_token_id
275
  )
 
280
 
281
  ```python
282
  from transformers import AutoTokenizer, AutoModelForCausalLM
283
+
284
  MODEL = "AndriLawrence/Qwen-3B-Intent-Microplan-v2/merged/sft-fp16"
285
 
286
  tok = AutoTokenizer.from_pretrained(MODEL, use_fast=True, trust_remote_code=True)
287
  model = AutoModelForCausalLM.from_pretrained(
288
+ MODEL, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True
289
  )
290
  ```
291
 
 
300
  n_ctx=4096,
301
  n_gpu_layers=35
302
  )
303
+
304
  resp = llm.create_chat_completion(messages=[
305
+ {
306
+ "role": "system",
307
+ "content": "You are LLM-1 (Rin). Output exactly one JSON object with {dialog,intent,microplan}."
308
+ },
309
+ {"role": "user", "content": "CONTEXT: {...}"}
310
  ])
311
  print(resp["choices"][0]["message"]["content"])
312
  ```
 
316
  ## 🏗️ Training Summary (v2)
317
 
318
  * **Base**: `Qwen/Qwen2.5-3B-Instruct`
319
+
320
  * **Finetune**: **SFT (LoRA, PEFT)**
321
 
322
  * LoRA: `r=16, alpha=32, dropout=0.1`
323
  * Target: `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
324
+
325
+ * **Batching**: `per_device_train_batch_size=1`, **grad_accum=16** (effective batch 16)
326
+
327
  * **Epochs**: 1–2
328
+
329
+ * **LR**: `2e-5`, cosine scheduler, warmup 5%, weight_decay `0.01`, `max_grad_norm=1.0`
330
+
331
+ * **Seq length**: typical sample ≤640–768 tokens, `packing=False`, `completion_only_loss=True`
332
+
333
+ * **Stability**: FP16 (T4), SDPA attention, gradient checkpointing
334
+
335
  * **Eval/Logging**: lightweight; save at step/epoch as needed
336
 
337
+ v2 also includes:
338
+
339
+ * marker normalization
340
+ * JSON schema validation
341
+ * intent whitelist checks
342
+ * length filtering for stable inference on consumer GPUs
343
 
344
  ---
345
 
 
362
 
363
  ## ✨ Changelog
364
 
365
+ **v2**
366
 
367
+ * English-only curated set, cleaned & rebalanced (90/10 split)
368
+ * Stronger JSON guardrails; fewer leaks; improved persona consistency
369
+ * Length filtering for stable inference/training on consumer GPUs
370
 
371
+ **v1**
372
 
373
+ * Initial SFT with looser distribution and softer JSON constraints; using RP merged model as base.