hdlm-group
/

hdlm-base-epsilon-0.0

@@ -2,146 +2,132 @@
 language:
 - en
 tags:
 - text-generation
 - diffusion
 - language-model
-license: mit
 ---
-# hdlm-group/hdlm-base-epsilon-0.0
-This is a epsilon_hybrid diffusion language model trained on text data.
-## Model Details
-- **Model Type**: epsilon_hybrid
-- **Architecture**: Diffusion-based language model
-- **Training Method**: Epsilon-hybrid diffusion training
-## Configuration
-```yaml
-ngpus: 2
-type: aligned
-gradient_accumulation_steps: 2
-model_type: epsilon_hybrid
-pretrain_autoregressive_path: /home/toolkit/research-diffcodegen/exp_local/openwebtext/mdlm-autoregressive/org-DiTAR-absorb-v2/checkpoints-meta/checkpoint.pth
-tokenizer:
-  tokens: 50257
-  model: gpt2
-training:
-  batch_size: 128
-  accum: ${gradient_accumulation_steps}
-  n_iters: 1000000
-  snapshot_freq: 5000
-  log_freq: 500
-  eval_freq: 5000
-  snapshot_freq_for_preemption: 3000
-  weight: standard
-  snapshot_sampling: true
-  ema: 0.9999
-  warmup_iter: -1
-  loss_type: hybrid
-  epsilon: 0.0
-  lambda: 0.0
-  hdlm:
-    stage: 2
-    path: /home/toolkit/research-diffcodegen/exp_local/openwebtext/ICLR-SAR-OpenWebText/small-hybrid0.1-block-1024-4096-block_causal-full-match_inference-efficient-hybrid_sigma_embedding-scale_by_sigma-with_transformer_sigma_conditioning/checkpoints-meta/checkpoint.pth
-data:
-  train: openwebtext-train
-  valid: wikitext103
-  cache_dir: /home/toolkit/research-diffcodegen/data
-  debug: false
-graph:
-  type: absorb
-  alpha: 1.0
-  file: /home/toolkit/research-diffcodegen/data
-  report_all: false
-  expanded_sigma: true
-noise:
-  type: loglinear
-  sigma_min: 0.0001
-  sigma_max: 2.0
-  ar_diffusion: false
-  expanded_sigma: ${graph.expanded_sigma}
-sampling:
-  predictor: analytic
-  steps_per_level: 1
-  noise_removal: true
-  strategy: direct
-  strategy_param: 0.9
-annealing:
-  type: none
-  efficient: false
-  width: 512
-  tau: 2048
-  eval_tau: 512
-  steps_per_level: ${sampling.steps_per_level}
-  sampling_method: sdlm
-  diffusion_loss_weight: 1.0
-  ce_loss_weight: 1.0
-  sampling_eps: 0.0001
-  attention:
-    context_type: causal
-    block_type: full
-  match_inference: false
-eval:
-  batch_size: 8
-  perplexity: true
-  perplexity_batch_size: 4
-optim:
-  weight_decay: 0.1
-  optimizer: AdamW
-  lr: 0.0004
-  beta1: 0.9
-  beta2: 0.95
-  eps: 1.0e-08
-  warmup: 10000
-  grad_clip: 1.0
-  scheduler: cosine
-experiment:
-  name: base_epsilon_0.0
-  wandb_project: Hybrid-SDLM-ALIGNED
-model:
-  name: sdlm-AR
-  type: ddit
-  hidden_size: 768
-  cond_dim: 128
-  length: 1024
-  n_blocks: 12
-  n_heads: 12
-  dropout: 0.1
-  scale_by_sigma: false
-  transformer_sigma_conditioning: false
-  hybrid_sigma_embedding: false
-  post_process_logits: false
-  use_timestep_embedding: false
-```
 ## Usage
 ```python
-from our.hf_utils import smart_model_loader
-# Load the model
-model, config, device, accelerator, metaschedule = smart_model_loader(
-    "hdlm-group/hdlm-base-epsilon-0.0",
-    model_type="epsilon_hybrid"
 )
-# Use the model for text generation
-# (Add specific usage examples based on your model's capabilities)
 ```
 ## Training Details
-This model was trained using the research-diffcodegen framework.
 ## Citation
-If you use this model in your research, please cite the original paper and this implementation.
 ## License
-This model is released under the MIT License.

 language:
 - en
 tags:
+- dllm
+- diffusion-language-model
 - text-generation
 - diffusion
 - language-model
+license: apache-2.0
 ---
+# HDLM-Epsilon: Hybrid Diffusion Language Model
+[![Paper](https://img.shields.io/badge/Paper-arXiv-red)](https://arxiv.org/abs/2504.06416)
+[![Code](https://img.shields.io/badge/Code-GitHub-blue)](https://github.com/ServiceNow/hdlm)
+This model card is for the **hdlm-base model with epsilon=0.0**
+## Model Description
+HDLM-Epsilon is a hybrid diffusion language model that unifies autoregressive and diffusion-based sequence generation through epsilon-hybrid noising. This model interpolates evolution operators between absorbing and uniform processes, making it conceptually closer to MDLM (Sahoo et al. 2024) while maintaining the benefits of both paradigms.
+The epsilon parameter (ε) controls the blend between absorbing and uniform processes during training, where smaller values emphasize the absorbing process and larger values incorporate more uniform noise.
+## Model Architecture
+- **Base Model**: Transformer architecture with custom conditioning layers
+- **Vocabulary Size**: 50,258 tokens (GPT-2 vocabulary + absorbing token)
+- **Context Length**: 1024 tokens
+- **Training**: Hybrid loss combining token masking with random token corruption
+- **Inference**: Supports multiple sampling algorithms including ACS (Adaptive Correction Sampler)
 ## Usage
+### Quick Start
 ```python
+from hdlm.hf_utils import smart_model_loader
+from hdlm.epsilon_hybrid.sample import full_diff
+from transformers import GPT2TokenizerFast
+import torch
+# Load model using smart loader (automatically detects model type)
+model, cfg, device, accelerator, metaschedule = smart_model_loader(
+    model_path="hdlm-group/hdlm-base-epsilon-0.0",
+    model_type="auto",  # automatically detects epsilon_hybrid
+    device="cuda"
+)
+# Load tokenizer
+tokenizer = GPT2TokenizerFast.from_pretrained('gpt2')
+# Generate text
+prompt = "The future of artificial intelligence"
+prompt_ids = tokenizer.encode(prompt, return_tensors='pt').to(device)
+# Full diffusion sampling
+generated = full_diff(
+    model=model,
+    prompt=prompt_ids,
+    batch_size=1,
+    alg='acs',  # or 'original', 'remask', 'remdm'
+    steps=512,
+    temperature=1.0,
+    context_length=1024,
+    device=device
 )
+# Decode generated text
+generated_text = tokenizer.decode(generated[0], skip_special_tokens=True)
+print(generated_text)
+```
+### Evaluation
+```bash
+# Text generation evaluation
+python hdlm/eval_generation.py \
+    --checkpoint_path hdlm-group/hdlm-base-epsilon-0.0 \
+    --sampling_method full_diff \
+    --algorithm acs \
+    --save_samples
+# Perplexity evaluation
+python hdlm/eval_modeling.py \
+    --checkpoint_path hdlm-group/hdlm-base-epsilon-0.0 \
+    --work_dir "./logs/eval_modeling_epsilon" \
+    --dataset ptb
 ```
 ## Training Details
+- **Dataset**: OpenWebText
+- **Batch Size**: 512
+- **Learning Rate**: 3e-4 with cosine scheduling
+- **Epsilon (ε)**: 0.01 (controls hybrid noising blend)
+- **Lambda (λ)**: 1.0 (weighting factor for unmasked tokens)
+- **Loss Type**: Hybrid loss combining masking and random token corruption
+- **Training Steps**: 1M iterations
+- **Warmup**: 50K steps
+## Sampling Algorithms
+The model supports several sampling algorithms:
+- **`original`**: Standard diffusion sampling
+- **`acs`**: Adaptive Correction Sampler with error correction
+- **`remask`**: Remasking strategy for improved quality
+- **`remdm`**: ReMDM-style sampling with probability mixing
+## Model Variants
+Available epsilon values and their characteristics:
+- **ε = 0.01**: Minimal uniform noise, closest to pure absorbing process
+- **ε = 0.1**: Moderate hybrid behavior
+- **ε = 0.5**: Balanced absorbing-uniform blend
 ## Citation
+```bibtex
+@article{fathi2025unifying,
+  title={Unifying autoregressive and diffusion-based sequence generation},
+  author={Fathi, Nima and Scholak, Torsten and No{\"e}l, Pierre-Andr{\'e}},
+  journal={arXiv preprint arXiv:2504.06416},
+  year={2025}
+}
+```
 ## License
+This model is released under the same license as the original HDLM codebase. Please refer to the [GitHub repository](https://github.com/ServiceNow/hdlm) for license details.