nimafathi commited on
Commit
bfbce11
verified
1 Parent(s): 45a1a82

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -120
README.md CHANGED
@@ -2,146 +2,132 @@
2
  language:
3
  - en
4
  tags:
 
 
5
  - text-generation
6
  - diffusion
7
  - language-model
8
- license: mit
9
  ---
10
 
11
- # hdlm-group/hdlm-base-epsilon-0.0
12
-
13
- This is a epsilon_hybrid diffusion language model trained on text data.
14
-
15
- ## Model Details
16
-
17
- - **Model Type**: epsilon_hybrid
18
- - **Architecture**: Diffusion-based language model
19
- - **Training Method**: Epsilon-hybrid diffusion training
20
-
21
- ## Configuration
22
-
23
- ```yaml
24
- ngpus: 2
25
- type: aligned
26
- gradient_accumulation_steps: 2
27
- model_type: epsilon_hybrid
28
- pretrain_autoregressive_path: /home/toolkit/research-diffcodegen/exp_local/openwebtext/mdlm-autoregressive/org-DiTAR-absorb-v2/checkpoints-meta/checkpoint.pth
29
- tokenizer:
30
- tokens: 50257
31
- model: gpt2
32
- training:
33
- batch_size: 128
34
- accum: ${gradient_accumulation_steps}
35
- n_iters: 1000000
36
- snapshot_freq: 5000
37
- log_freq: 500
38
- eval_freq: 5000
39
- snapshot_freq_for_preemption: 3000
40
- weight: standard
41
- snapshot_sampling: true
42
- ema: 0.9999
43
- warmup_iter: -1
44
- loss_type: hybrid
45
- epsilon: 0.0
46
- lambda: 0.0
47
- hdlm:
48
- stage: 2
49
- path: /home/toolkit/research-diffcodegen/exp_local/openwebtext/ICLR-SAR-OpenWebText/small-hybrid0.1-block-1024-4096-block_causal-full-match_inference-efficient-hybrid_sigma_embedding-scale_by_sigma-with_transformer_sigma_conditioning/checkpoints-meta/checkpoint.pth
50
- data:
51
- train: openwebtext-train
52
- valid: wikitext103
53
- cache_dir: /home/toolkit/research-diffcodegen/data
54
- debug: false
55
- graph:
56
- type: absorb
57
- alpha: 1.0
58
- file: /home/toolkit/research-diffcodegen/data
59
- report_all: false
60
- expanded_sigma: true
61
- noise:
62
- type: loglinear
63
- sigma_min: 0.0001
64
- sigma_max: 2.0
65
- ar_diffusion: false
66
- expanded_sigma: ${graph.expanded_sigma}
67
- sampling:
68
- predictor: analytic
69
- steps_per_level: 1
70
- noise_removal: true
71
- strategy: direct
72
- strategy_param: 0.9
73
- annealing:
74
- type: none
75
- efficient: false
76
- width: 512
77
- tau: 2048
78
- eval_tau: 512
79
- steps_per_level: ${sampling.steps_per_level}
80
- sampling_method: sdlm
81
- diffusion_loss_weight: 1.0
82
- ce_loss_weight: 1.0
83
- sampling_eps: 0.0001
84
- attention:
85
- context_type: causal
86
- block_type: full
87
- match_inference: false
88
- eval:
89
- batch_size: 8
90
- perplexity: true
91
- perplexity_batch_size: 4
92
- optim:
93
- weight_decay: 0.1
94
- optimizer: AdamW
95
- lr: 0.0004
96
- beta1: 0.9
97
- beta2: 0.95
98
- eps: 1.0e-08
99
- warmup: 10000
100
- grad_clip: 1.0
101
- scheduler: cosine
102
- experiment:
103
- name: base_epsilon_0.0
104
- wandb_project: Hybrid-SDLM-ALIGNED
105
- model:
106
- name: sdlm-AR
107
- type: ddit
108
- hidden_size: 768
109
- cond_dim: 128
110
- length: 1024
111
- n_blocks: 12
112
- n_heads: 12
113
- dropout: 0.1
114
- scale_by_sigma: false
115
- transformer_sigma_conditioning: false
116
- hybrid_sigma_embedding: false
117
- post_process_logits: false
118
- use_timestep_embedding: false
119
 
120
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
121
 
122
  ## Usage
123
 
 
 
124
  ```python
125
- from our.hf_utils import smart_model_loader
 
 
 
 
 
 
 
 
 
 
126
 
127
- # Load the model
128
- model, config, device, accelerator, metaschedule = smart_model_loader(
129
- "hdlm-group/hdlm-base-epsilon-0.0",
130
- model_type="epsilon_hybrid"
 
 
 
 
 
 
 
 
 
 
 
 
 
131
  )
132
 
133
- # Use the model for text generation
134
- # (Add specific usage examples based on your model's capabilities)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
135
  ```
136
 
137
  ## Training Details
138
 
139
- This model was trained using the research-diffcodegen framework.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
140
 
141
  ## Citation
142
 
143
- If you use this model in your research, please cite the original paper and this implementation.
 
 
 
 
 
 
 
144
 
145
  ## License
146
 
147
- This model is released under the MIT License.
 
2
  language:
3
  - en
4
  tags:
5
+ - dllm
6
+ - diffusion-language-model
7
  - text-generation
8
  - diffusion
9
  - language-model
10
+ license: apache-2.0
11
  ---
12
 
13
+ # HDLM-Epsilon: Hybrid Diffusion Language Model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
+ [![Paper](https://img.shields.io/badge/Paper-arXiv-red)](https://arxiv.org/abs/2504.06416)
16
+ [![Code](https://img.shields.io/badge/Code-GitHub-blue)](https://github.com/ServiceNow/hdlm)
17
+
18
+ This model card is for the **hdlm-base model with epsilon=0.0**
19
+
20
+ ## Model Description
21
+
22
+ HDLM-Epsilon is a hybrid diffusion language model that unifies autoregressive and diffusion-based sequence generation through epsilon-hybrid noising. This model interpolates evolution operators between absorbing and uniform processes, making it conceptually closer to MDLM (Sahoo et al. 2024) while maintaining the benefits of both paradigms.
23
+
24
+ The epsilon parameter (蔚) controls the blend between absorbing and uniform processes during training, where smaller values emphasize the absorbing process and larger values incorporate more uniform noise.
25
+
26
+ ## Model Architecture
27
+
28
+ - **Base Model**: Transformer architecture with custom conditioning layers
29
+ - **Vocabulary Size**: 50,258 tokens (GPT-2 vocabulary + absorbing token)
30
+ - **Context Length**: 1024 tokens
31
+ - **Training**: Hybrid loss combining token masking with random token corruption
32
+ - **Inference**: Supports multiple sampling algorithms including ACS (Adaptive Correction Sampler)
33
 
34
  ## Usage
35
 
36
+ ### Quick Start
37
+
38
  ```python
39
+ from hdlm.hf_utils import smart_model_loader
40
+ from hdlm.epsilon_hybrid.sample import full_diff
41
+ from transformers import GPT2TokenizerFast
42
+ import torch
43
+
44
+ # Load model using smart loader (automatically detects model type)
45
+ model, cfg, device, accelerator, metaschedule = smart_model_loader(
46
+ model_path="hdlm-group/hdlm-base-epsilon-0.0",
47
+ model_type="auto", # automatically detects epsilon_hybrid
48
+ device="cuda"
49
+ )
50
 
51
+ # Load tokenizer
52
+ tokenizer = GPT2TokenizerFast.from_pretrained('gpt2')
53
+
54
+ # Generate text
55
+ prompt = "The future of artificial intelligence"
56
+ prompt_ids = tokenizer.encode(prompt, return_tensors='pt').to(device)
57
+
58
+ # Full diffusion sampling
59
+ generated = full_diff(
60
+ model=model,
61
+ prompt=prompt_ids,
62
+ batch_size=1,
63
+ alg='acs', # or 'original', 'remask', 'remdm'
64
+ steps=512,
65
+ temperature=1.0,
66
+ context_length=1024,
67
+ device=device
68
  )
69
 
70
+ # Decode generated text
71
+ generated_text = tokenizer.decode(generated[0], skip_special_tokens=True)
72
+ print(generated_text)
73
+ ```
74
+
75
+ ### Evaluation
76
+
77
+ ```bash
78
+ # Text generation evaluation
79
+ python hdlm/eval_generation.py \
80
+ --checkpoint_path hdlm-group/hdlm-base-epsilon-0.0 \
81
+ --sampling_method full_diff \
82
+ --algorithm acs \
83
+ --save_samples
84
+
85
+ # Perplexity evaluation
86
+ python hdlm/eval_modeling.py \
87
+ --checkpoint_path hdlm-group/hdlm-base-epsilon-0.0 \
88
+ --work_dir "./logs/eval_modeling_epsilon" \
89
+ --dataset ptb
90
  ```
91
 
92
  ## Training Details
93
 
94
+ - **Dataset**: OpenWebText
95
+ - **Batch Size**: 512
96
+ - **Learning Rate**: 3e-4 with cosine scheduling
97
+ - **Epsilon (蔚)**: 0.01 (controls hybrid noising blend)
98
+ - **Lambda (位)**: 1.0 (weighting factor for unmasked tokens)
99
+ - **Loss Type**: Hybrid loss combining masking and random token corruption
100
+ - **Training Steps**: 1M iterations
101
+ - **Warmup**: 50K steps
102
+
103
+ ## Sampling Algorithms
104
+
105
+ The model supports several sampling algorithms:
106
+
107
+ - **`original`**: Standard diffusion sampling
108
+ - **`acs`**: Adaptive Correction Sampler with error correction
109
+ - **`remask`**: Remasking strategy for improved quality
110
+ - **`remdm`**: ReMDM-style sampling with probability mixing
111
+
112
+ ## Model Variants
113
+
114
+ Available epsilon values and their characteristics:
115
+
116
+ - **蔚 = 0.01**: Minimal uniform noise, closest to pure absorbing process
117
+ - **蔚 = 0.1**: Moderate hybrid behavior
118
+ - **蔚 = 0.5**: Balanced absorbing-uniform blend
119
 
120
  ## Citation
121
 
122
+ ```bibtex
123
+ @article{fathi2025unifying,
124
+ title={Unifying autoregressive and diffusion-based sequence generation},
125
+ author={Fathi, Nima and Scholak, Torsten and No{\"e}l, Pierre-Andr{\'e}},
126
+ journal={arXiv preprint arXiv:2504.06416},
127
+ year={2025}
128
+ }
129
+ ```
130
 
131
  ## License
132
 
133
+ This model is released under the same license as the original HDLM codebase. Please refer to the [GitHub repository](https://github.com/ServiceNow/hdlm) for license details.