Update README.md

Browse files

Files changed (1) hide show

README.md +234 -96

README.md CHANGED Viewed

@@ -1,77 +1,78 @@
----
-language:
-- en
-tags:
-- qwen
-- qwen2.5
-- 3b
-- lora
-- peft
-- sft
-- dialog
-- intent-detection
-- microplanning
-- npc
-library_name: transformers
-license: other
-pipeline_tag: text-generation
-model-index:
-- name: AndriLawrence/Qwen-3B-Intent-Microplan-v2
-  results: []
-  datasets:
-  - name: llm1_qwen_base_lora16_v6 (curated v2)
-    type: jsonl
-    args:
-      split: train/val 90/10
-      size_train: 4320
-      size_val: 480
-      size_total_source: ~6300
-      description: >-
-        English-only, diegetic NPC dataset; strict JSON outputs with {dialog,
-        intent, microplan}.
-      label_space:
-      - social_greeting
-      - acknowledge_touch
-      - acknowledge_compliment
-      - react_to_player_action
-      - invite_follow
-      - encourage_explain
-      - calm_reassure
-      - idle_initiative
-      - respect_distance
-      - initiate_hand_holding
-      - initiate_hug
-      - cuddle_sleep
-      - offer_item
-      - accept_item
-      - open_door
-      - inspect_object
-      - trigger_object
-      - small_talk_emotion
-      - end_conversation_politely
-  configs:
-  - task: text-generation
-    base_model: Qwen/Qwen2.5-3B-Instruct
-    adapters:
-    - type: lora
-      path: checkpoints/adapter_final
-    merged_variants:
-    - path: merged/sft-fp16
-    quantized:
-    - format: gguf
-      files:
-      - gguf/sft-q6_k.gguf
-      - gguf/sft-q4_k_m.gguf
----
 # AndriLawrence/Qwen-3B-Intent-Microplan-v2
-**English-only** finetune of **Qwen2.5-3B-Instruct** for **intent + microplan–driven NPC dialog**.
 The model reads a structured **CONTEXT JSON** (environment, relationship, mood, signals) and produces:
-- `intent` (one of 19 whitelisted labels)
-- `microplan` (low-level action primitives)
-- `dialog` as **strict JSON**.
 > **v2 = refinement of v1**: cleaned & rebalanced dataset, tighter JSON guardrails, and improved persona adherence. v2 is more stable (almost no JSON leaks), better label alignment, and more consistent diegetic tone.
@@ -79,39 +80,147 @@ The model reads a structured **CONTEXT JSON** (environment, relationship, mood,
 ## 🧩 Intended Use
-- Real-time NPC/companion systems where **logic (intent/microplan)** and **surface (dialog)** are controllable.
-- Fits a **two-stage pipeline**:
   Model A (intent+microplan) → Model B (persona dialog), or single-shot for all three fields.
 **Limitations**
-- English-only.
 ---
 ## 📦 Assets
-- **LoRA adapters (PEFT, SFT)** → `checkpoints/adapter_final`
-- **Merged FP16** → `merged/sft-fp16`
-- **GGUF quants (llama.cpp / llama-cpp-python)** → `gguf/sft-q6_k.gguf`, `gguf/sft-q4_k_m.gguf`
 ---
-**Allowed intents (19):**
-`social_greeting, acknowledge_touch, acknowledge_compliment, react_to_player_action, invite_follow, encourage_explain, calm_reassure, idle_initiative, respect_distance, initiate_hand_holding, initiate_hug, cuddle_sleep, offer_item, accept_item, open_door, inspect_object, trigger_object, small_talk_emotion, end_conversation_politely`.
 ---
 ## 🧠 Output Contract
 **Single JSON object**:
 ```json
 {
-  "dialog": [{"speaker":"npc","text":"..."}],
   "intent": "invite_follow",
-  "microplan": ["Gesture(name=GestureForward, seconds=0.7)", "LookAt(target=player, seconds=1.0)"]
 }
-````
 ---
@@ -137,8 +246,18 @@ model = AutoModelForCausalLM.from_pretrained(
 model = PeftModel.from_pretrained(model, ADAPTER)
 messages = [
-  {"role":"system","content":"You are an in-world companion. Output strictly one JSON object with {dialog,intent,microplan}. No meta talk."},
-  {"role":"user","content":"CONTEXT: {...}"}  # your context JSON
 ]
 prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
@@ -148,8 +267,9 @@ out = model.generate(
     **ids,
     max_new_tokens=160,
     do_sample=True,
-    temperature=0.4,
     top_p=0.9,
     repetition_penalty=1.05,
     eos_token_id=tok.eos_token_id
 )
@@ -160,11 +280,12 @@ print(tok.decode(out[0], skip_special_tokens=True))
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 MODEL = "AndriLawrence/Qwen-3B-Intent-Microplan-v2/merged/sft-fp16"
 tok = AutoTokenizer.from_pretrained(MODEL, use_fast=True, trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained(
-  MODEL, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True
 )
 ```
@@ -179,8 +300,13 @@ llm = Llama.from_pretrained(
   n_ctx=4096,
   n_gpu_layers=35
 )
 resp = llm.create_chat_completion(messages=[
-  {"role":"user","content":"CONTEXT: {...}"}
 ])
 print(resp["choices"][0]["message"]["content"])
 ```
@@ -190,18 +316,30 @@ print(resp["choices"][0]["message"]["content"])
 ## 🏗️ Training Summary (v2)
 * **Base**: `Qwen/Qwen2.5-3B-Instruct`
 * **Finetune**: **SFT (LoRA, PEFT)**
   * LoRA: `r=16, alpha=32, dropout=0.1`
   * Target: `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
-* **Batching**: per_device=1, **grad_accum=16** (effective batch 16)
 * **Epochs**: 1–2
-* **LR**: `2e-5`, cosine, warmup 5%, weight_decay 0.01, max_grad_norm 1.0
-* **Seq length**: trimmed samples (typ. ≤640–768 tokens), `packing=False`, `completion_only_loss=True`
-* **Stability**: FP16 (T4), SDPA attention, gradient checkpointing (non-reentrant)
 * **Eval/Logging**: lightweight; save at step/epoch as needed
-> v2 also includes a QA/cleaning step (marker normalization, JSON schema validation, intent whitelist, length filter).
 ---
@@ -224,12 +362,12 @@ Please review `LICENSE` here and the license for `Qwen/Qwen2.5-3B-Instruct` befo
 ## ✨ Changelog
-* **v2**
-  * English-only curated set, cleaned & rebalanced (90/10 split).
-  * Stronger JSON guardrails; fewer leaks; improved persona consistency.
-  * Length filtering for stable inference/training on consumer GPUs.
-* **v1**
-  * Initial SFT with looser distribution and softer JSON constraints; Using RP merged model as base.

+---
+language:
+- en
+tags:
+- qwen
+- qwen2.5
+- 3b
+- lora
+- peft
+- sft
+- dialog
+- intent-detection
+- microplanning
+- npc
+library_name: transformers
+license: other
+pipeline_tag: text-generation
+model-index:
+- name: AndriLawrence/Qwen-3B-Intent-Microplan-v2
+  results: []
+  datasets:
+  - name: llm1_qwen_base_lora16_v6 (curated v2)
+    type: jsonl
+    args:
+      split: train/val 90/10
+      size_train: 4320
+      size_val: 480
+      size_total_source: ~6300
+      description: >-
+        English-only, diegetic NPC dataset; strict JSON outputs with {dialog,
+        intent, microplan}.
+      label_space:
+      - social_greeting
+      - acknowledge_touch
+      - acknowledge_compliment
+      - react_to_player_action
+      - invite_follow
+      - encourage_explain
+      - calm_reassure
+      - idle_initiative
+      - respect_distance
+      - initiate_hand_holding
+      - initiate_hug
+      - cuddle_sleep
+      - offer_item
+      - accept_item
+      - open_door
+      - inspect_object
+      - trigger_object
+      - small_talk_emotion
+      - end_conversation_politely
+  configs:
+  - task: text-generation
+    base_model: Qwen/Qwen2.5-3B-Instruct
+    adapters:
+    - type: lora
+      path: checkpoints/adapter_final
+    merged_variants:
+    - path: merged/sft-fp16
+    quantized:
+    - format: gguf
+      files:
+      - gguf/sft-q6_k.gguf
+      - gguf/sft-q4_k_m.gguf
+---
 # AndriLawrence/Qwen-3B-Intent-Microplan-v2
+**English-only** finetune of **Qwen2.5-3B-Instruct** for **intent + microplan–driven NPC dialog**.
 The model reads a structured **CONTEXT JSON** (environment, relationship, mood, signals) and produces:
+* `intent` (one of 19 whitelisted labels)
+* `microplan` (low-level action primitives)
+* `dialog` as **strict JSON**
 > **v2 = refinement of v1**: cleaned & rebalanced dataset, tighter JSON guardrails, and improved persona adherence. v2 is more stable (almost no JSON leaks), better label alignment, and more consistent diegetic tone.
 ## 🧩 Intended Use
+* Real-time NPC/companion systems where **logic (intent/microplan)** and **surface (dialog)** are controllable.
+* Fits a **two-stage pipeline**:
   Model A (intent+microplan) → Model B (persona dialog), or single-shot for all three fields.
 **Limitations**
+* English-only.
 ---
 ## 📦 Assets
+* **LoRA adapters (PEFT, SFT)** → `checkpoints/adapter_final`
+* **Merged FP16** → `merged/sft-fp16`
+* **GGUF quants (llama.cpp / llama-cpp-python)** → `gguf/sft-q6_k.gguf`, `gguf/sft-q4_k_m.gguf`
+---
+## 🎮 Rin JSON Brain – Recommended System Prompt
+This is the system prompt used in the author’s VR NPC setup (Unity).
+It makes the model act as **Rin**, a warm, casual in-world companion that always outputs one JSON object:
+```text
+SYSTEM
+You are **LLM-1**, the social brain of a VR NPC named **Rin** (warm, gentle, supportive, casual).
+You read one JSON event and must reply with **exactly one** JSON object. No extra text.
+OUTPUT SCHEMA:
+{
+  "dialog": [{ "speaker": "npc", "text": string }],
+  "intent": string,
+  "microplan": [string]
+}
+INTERNAL THINKING (silent, super short):
+- In your head, ask: “What happened?” and summarize it in one very short line.
+- Still in your head, pick the best intent and microplan.
+- Think fast and efficiently; no long inner monologue.
+- Do NOT show your thoughts or any <think> tags; only output the JSON.
+RULES:
+- English only, first person as Rin.
+- Tone: relaxed, soft, a bit playful; never formal or corporate.
+- Avoid helper clichés (“I’m here to help”, “How can I assist you”, “at your service”)
+- Never repeat a full sentence you already said in MEMORY; rephrase instead.
+- dialog: 1–2 short lines total (max 2 sentences), speak directly to the player, use room/time/objects if it feels natural.
+ALLOWED_INTENTS:
+- social_greeting
+- acknowledge_touch
+- acknowledge_compliment
+- react_to_player_action
+- invite_follow
+- encourage_explain
+- calm_reassure
+- idle_initiative
+- respect_distance
+- initiate_hand_holding
+- initiate_hug
+- cuddle_sleep
+- offer_item
+- accept_item
+- open_door
+- inspect_object
+- trigger_object
+- small_talk_emotion
+- end_conversation_politely
+MICROPLAN (optional, 0–5 steps; or []):
+- "Smile (0.6)"
+- "Nod (0.5)"
+- "Eye contact (1.2s)"
+- "Step back (0.3m)"
+- "Extend hand"
+- "Hug (gentle, 2s)"
+- "Offer blanket"
+LIGHT ROUTING:
+- event == "Player_Touches" → "acknowledge_touch".
+- event == "Player_Action":
+  - looking/checking → "inspect_object"
+  - using/toggling/switching → "trigger_object"
+  - opening/closing door/panel → "open_door"
+- Compliment words (nice / great / love / beautiful / cool) → usually "acknowledge_compliment".
+- Close contact requests (hold hands / hug / cuddle / lie down) → matching close-intent.
+- Very close without request (distance < 0.5m) → "respect_distance" (+ maybe "Step back (0.3m)").
+- If nothing urgent → "idle_initiative" or "small_talk_emotion".
+```
 ---
+## 🔧 Recommended Inference Settings
+These are the “sweet spot” sampling settings used in the Unity client (Ollama/llama.cpp-style).
+They balance creativity with JSON stability for Rin:
+```json
+{
+  "temperature": 0.92,
+  "top_p": 0.90,
+  "top_k": 40,
+  "repetition_penalty": 1.05,
+  "repeat_last_n": 192,
+  "num_ctx": 4096,
+  "mirostat": 2,
+  "mirostat_tau": 2.18,
+  "mirostat_eta": 0.11,
+  "seed": 42,                // or random per call
+  "max_tokens": 160          // enough for one JSON object
+}
+```
+Unity-side extras used by the author:
+* **Max Resample**: `2`
+* **Resample Temp Step**: `0.1`
+* **Memory**: last `10` dialog turns + `6` recent actions
+You can safely lower `temperature` to ~0.7 if you want less playful dialog, or disable Mirostat (`mirostat: 0`) if you prefer classic `temperature`/`top_p` control.
 ---
 ## 🧠 Output Contract
 **Single JSON object**:
 ```json
 {
+  "dialog": [
+    {
+      "speaker": "npc",
+      "text": "Come on, this way; the room’s quiet and warm tonight."
+    }
+  ],
   "intent": "invite_follow",
+  "microplan": ["Smile (0.6)", "Extend hand"]
 }
+```
+No extra prose, markdown, or `<think>` blocks are expected.
 ---
 model = PeftModel.from_pretrained(model, ADAPTER)
 messages = [
+  {
+    "role": "system",
+    "content": (
+      "You are LLM-1, the social brain of a VR NPC named Rin. "
+      "Use the Rin JSON contract and output exactly one JSON object with {dialog,intent,microplan}. "
+      "No extra text."
+    )
+  },
+  {
+    "role": "user",
+    "content": "CONTEXT: {...}"  # your context JSON event
+  }
 ]
 prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
     **ids,
     max_new_tokens=160,
     do_sample=True,
+    temperature=0.9,
     top_p=0.9,
+    top_k=40,
     repetition_penalty=1.05,
     eos_token_id=tok.eos_token_id
 )
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 MODEL = "AndriLawrence/Qwen-3B-Intent-Microplan-v2/merged/sft-fp16"
 tok = AutoTokenizer.from_pretrained(MODEL, use_fast=True, trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained(
+    MODEL, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True
 )
 ```
   n_ctx=4096,
   n_gpu_layers=35
 )
 resp = llm.create_chat_completion(messages=[
+  {
+    "role": "system",
+    "content": "You are LLM-1 (Rin). Output exactly one JSON object with {dialog,intent,microplan}."
+  },
+  {"role": "user", "content": "CONTEXT: {...}"}
 ])
 print(resp["choices"][0]["message"]["content"])
 ```
 ## 🏗️ Training Summary (v2)
 * **Base**: `Qwen/Qwen2.5-3B-Instruct`
 * **Finetune**: **SFT (LoRA, PEFT)**
   * LoRA: `r=16, alpha=32, dropout=0.1`
   * Target: `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
+* **Batching**: `per_device_train_batch_size=1`, **grad_accum=16** (effective batch 16)
 * **Epochs**: 1–2
+* **LR**: `2e-5`, cosine scheduler, warmup 5%, weight_decay `0.01`, `max_grad_norm=1.0`
+* **Seq length**: typical sample ≤640–768 tokens, `packing=False`, `completion_only_loss=True`
+* **Stability**: FP16 (T4), SDPA attention, gradient checkpointing
 * **Eval/Logging**: lightweight; save at step/epoch as needed
+v2 also includes:
+* marker normalization
+* JSON schema validation
+* intent whitelist checks
+* length filtering for stable inference on consumer GPUs
 ---
 ## ✨ Changelog
+**v2**
+* English-only curated set, cleaned & rebalanced (90/10 split)
+* Stronger JSON guardrails; fewer leaks; improved persona consistency
+* Length filtering for stable inference/training on consumer GPUs
+**v1**
+* Initial SFT with looser distribution and softer JSON constraints; using RP merged model as base.