File size: 2,751 Bytes
b7ee2fa
 
3afe260
da2c0a8
b7ee2fa
 
da2c0a8
 
 
b7ee2fa
 
da2c0a8
b7ee2fa
da2c0a8
 
b7ee2fa
da2c0a8
b7ee2fa
da2c0a8
8f289f6
 
 
b7ee2fa
8f289f6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
da2c0a8
b7ee2fa
da2c0a8
b7ee2fa
da2c0a8
b7ee2fa
 
da2c0a8
b7ee2fa
da2c0a8
b7ee2fa
3afe260
da2c0a8
 
 
 
 
b7ee2fa
da2c0a8
b7ee2fa
 
 
da2c0a8
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
base_model: google/gemma-3n-E4B-it
library_name: peft
model_name: gemma-3n-E4B-transcribe-zh-tw-1
tags:
- generated_from_trainer
- trl
- sft
licence: license
---

# Model Card for gemma-3n-E4B-transcribe-zh-tw-1

This model is a fine-tuned version of [google/gemma-3n-E4B-it](https://huggingface.co/google/gemma-3n-E4B-it).
It has been trained using [TRL](https://github.com/huggingface/trl).

## Quick start

```python
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoProcessor

device = "cuda" if torch.cuda.is_available() else "cpu"

processor = AutoProcessor.from_pretrained("google/gemma-3n-E4B-it", device_map="auto")
base_model = AutoModelForCausalLM.from_pretrained("google/gemma-3n-E4B-it")
model = PeftModel.from_pretrained(
    base_model, "JacobLinCool/gemma-3n-E4B-transcribe-zh-tw-1"
).to(device)


def trascribe(model, processor, audio):
    messages = [
        {
            "role": "system",
            "content": [
                {
                    "type": "text",
                    "text": "You are an assistant that transcribes speech accurately.",
                }
            ],
        },
        {
            "role": "user",
            "content": [
                {"type": "audio", "audio": audio},
                {"type": "text", "text": "Transcribe this audio."},
            ],
        },
    ]

    input_ids = processor.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt",
    )
    input_ids = input_ids.to(device, dtype=model.dtype)

    model.eval()
    with torch.no_grad():
        outputs = model.generate(**input_ids, max_new_tokens=128)

    prediction = processor.batch_decode(
        outputs, skip_special_tokens=True, clean_up_tokenization_spaces=False
    )[0]
    prediction = prediction.split("\nmodel\n")[-1].strip()
    return prediction


if __name__ == "__main__":
    prediction = trascribe(model, processor, "/workspace/audio.mp3")
    print(prediction)
```

## Training procedure

 


This model was trained with SFT.

### Framework versions

- PEFT 0.15.2
- TRL: 0.19.0
- Transformers: 4.53.0
- Pytorch: 2.8.0.dev20250319+cu128
- Datasets: 3.6.0
- Tokenizers: 0.21.2

## Citations



Cite TRL as:
    
```bibtex
@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}
```