vilhess commited on
Commit
4b7e045
Β·
verified Β·
1 Parent(s): c7688d0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +131 -5
README.md CHANGED
@@ -1,10 +1,136 @@
1
  ---
 
 
 
 
2
  tags:
3
- - model_hub_mixin
4
- - pytorch_model_hub_mixin
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
6
 
7
  This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Code: [More Information Needed]
9
- - Paper: [More Information Needed]
10
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ datasets:
3
+ - thuml/UTSD
4
+ - Salesforce/GiftEvalPretrain
5
+ pipeline_tag: time-series-forecasting
6
  tags:
7
+ - zero-shot
8
+ - forecasting
9
+ - timeseries
10
+ - foundationmodels
11
+ - llms
12
+ ---
13
+ # A tutorial on how to build a Foundation Model for Univariate Time Series Forecasting
14
+
15
+ A concise, reproducible recipe for training a transformer-based, patch-to-patch forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token β†’ next-patch) while remaining lightweight compared to a classic LLM and practical.
16
+
17
+ ## Highlights
18
+ - Next-patch prediction objective (autoregressive, causal)
19
+ - Patch-based representation of time series (tokens ↔ patches)
20
+ - Causal masking self-attention with RoPE (relative positions)
21
+ - RevIN (Reversible Instance Normalization)
22
+ - SwiGLU feed-forward networks
23
+ - Multi-quantile outputs (median + uncertainty bands)
24
+
25
+ ---
26
+ tags:
27
+ - timeseries
28
+ - forecasting
29
+ - transformer
30
+ - patches
31
+ - foundation
32
+ - zero-shot
33
+ pipeline_tag: time-series-forecasting
34
  ---
35
 
36
  This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
37
+ - Code: [GitHub](https://github.com/vilhess/PatchFM)
38
+ - Paper: Incoming
39
+
40
+ ## Installation
41
+ ```bash
42
+ git clone https://github.com/vilhess/PatchFM
43
+ cd PatchFM
44
+ pip install -r requirements.txt
45
+ ```
46
+
47
+ ## Quick Start
48
+
49
+ ```python
50
+ import torch
51
+ from model import Forecaster
52
+ from configs import PatchFMConfig
53
+
54
+ # --- Instantiate model ---
55
+ config = PatchFMConfig(load_from _hub=True)
56
+ model = Forecaster(config)
57
+
58
+ # --- Inference ---
59
+ forecast_horizon = 64
60
+ seq = torch.randn(1, 1024) # (batch, time)
61
+ pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9]) # (batch, time, quantiles)
62
+ ```
63
+
64
+ We provide an extended quick start example in [notebooks/tutorial.ipynb](./notebooks/tutorial.ipynb).
65
+ If you dont have suitable hardware you can run the the extended quick start example example also in Google Colab:
66
+
67
+ <a target="_blank" href="https://colab.research.google.com/drive/17sdf-7luCkv5TaeLj3Z6kIaTDkwkz3VR?usp=share_link">
68
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open Quick Start In Colab"/>
69
+ </a>
70
+
71
+ ## Method (TL;DR)
72
+ - Patching: Split a context signal of length $w$ into $P_{num} = w / P_{len}$ patches of length $P_{len}$.
73
+ - RevIN: Normalize patches using causal running mean/variance over past patches, and denormalize outputs to the original scale.
74
+ - Architecture: Input residual MLP β†’ stacked Transformer blocks (MHA + SwiGLU FFN, pre-norm, residual) β†’ $|\mathcal{Q}|$ output heads mapping back to patch space.
75
+ - Positional encoding: Rotary Position Embeddings (RoPE) applied to queries/keys.
76
+ - Training: Multi-quantile (pinball) loss across positions, elements, and quantiles $\mathcal{Q}$.
77
+ - Inference: Predict next patch; roll out autoregressively with KV caching for long horizons.
78
+
79
+ ## Problem Formulation
80
+ Given context patches $x_{p_1}, \ldots, x_{p_n}$, predict the next patch $x_{p_{i+1}}$ for each position $i$ using only past patches (causality). The model outputs quantiles $\{\hat{x}_{p_{i+1}}^{(q)}: q \in \mathcal{Q}\}$ with median (q=0.5) as the point forecast.
81
+
82
+ ## Loss: Multi-Quantile (Pinball)
83
+ For residual $u = x - \hat{x}^{(q)}$:
84
+ $$\rho_q(u) = \begin{cases} q\,u, & u \ge 0,\\ (q-1)\,u, & u < 0. \end{cases}$$
85
+ Aggregate over positions, patch elements, and quantiles.
86
+
87
+ ## Architecture
88
+ - Input MLP: $\mathbb{R}^{P_{len}} \to \mathbb{R}^{dim}$ residual 2-layer MLP (ReLU)
89
+ - Multi-Head Attention: causal mask, RoPE; queries/keys/values per head
90
+ - FFN: SwiGLU (SiLU-gated), pre-norm + residual
91
+ - Output heads: |Q| linear maps $\mathbb{R}^{dim} \to \mathbb{R}^{P_{len}}$ (one per quantile)
92
+
93
+ ### Model Details
94
+ - Patch size: 32
95
+ - Max context: 32 patches (1024 steps)
96
+ - Forecast horizon: 32 steps per forward pass
97
+ - Quantiles $\mathcal{Q}$: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}
98
+ - Layers: 6
99
+ - Attention heads: 64 (head dim 32)
100
+ - Model dim: 2048
101
+ - Parameters: ~300M
102
+
103
+ ## Inference
104
+ - Single step: predict next patch ($P_{len}$ values)
105
+ - Long-horizon: append prediction to context and repeat (optionally drop oldest patch to keep window fixed)
106
+
107
+ ## Datasets
108
+ - UTSD (Unified Time Series Dataset) [UTSD]: seven domains (Energy, IoT, Nature, Web, Health, Transport, Environment). We start with UTSD-1G (~55M series after preprocessing).
109
+ - Artificial: ~1M synthetic series (sinusoidal, linear, polynomial, logarithmic) plus mixtures via TSMixup [Chronos]; Gaussian Process samples via KernelSynth (mixtures of RBF/periodic/linear kernels with swept hyperparameters).
110
+
111
+ ## Repository Layout
112
+
113
+ - `model/training/` β€” main PatchFM model class
114
+
115
+ - `modules.py` - core modules (Residual Layers, MHA, SwiGLU, RoPE, Transformer Encoder, ...)
116
+ - `revin.py` β€” causal RevIN
117
+ - `loss.py` β€” multi-quantile (pinball) loss
118
+ - `trainer.py` β€” PyTorch Lightning trainer class
119
+
120
+ - `model/inference/` β€” main PatchFM model class for inference
121
+ - `modules.py` β€” core modules
122
+ - `forecaster.py` β€” Forecasting model
123
+
124
+ - `dataset/` β€” data loading and preprocessing
125
+ - `artificial.py` β€” synthetic dataset : artificial signals + TSMixup + KernelSynth
126
+ - `utsd.py` β€” Unified Time Series Dataset (UTSD) loading and preprocessing
127
+ - `get_data.py` β€” utility to fetch and preprocess datasets
128
+ - `generate_data.py` β€” utility to generate and save the KernelSynth dataset (long to generate)
129
+
130
+ - `configs/` β€” model and training configurations
131
+ - `notebooks/inference` β€” how to load a trained model and generate forecasts
132
+ - `training.py` β€” training script using PyTorch Lightning
133
+
134
+ ## Acknowledgements
135
+ We thank the authors of the following repositories for inspiration and code snippets:
136
+ - [TiRex](https://github.com/NX-AI/tirex)