vilhess commited on
Commit
fd5a2eb
·
verified ·
1 Parent(s): fdf5de4

Push model using huggingface_hub.

Browse files
Files changed (3) hide show
  1. README.md +5 -133
  2. config.json +1 -1
  3. model.safetensors +1 -1
README.md CHANGED
@@ -1,138 +1,10 @@
1
  ---
2
- datasets:
3
- - thuml/UTSD
4
- - Salesforce/GiftEvalPretrain
5
- pipeline_tag: time-series-forecasting
6
  tags:
7
- - zero-shot
8
- - forecasting
9
- - timeseries
10
- - foundationmodels
11
- - llms
12
- ---
13
- # A tutorial on how to build a Foundation Model for Univariate Time Series Forecasting
14
-
15
- A concise, reproducible recipe for training a transformer-based, patch-to-patch forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token → next-patch) while remaining lightweight compared to a classic LLM and practical.
16
-
17
- ## Highlights
18
- - Next-patch prediction objective (autoregressive, causal)
19
- - Patch-based representation of time series (tokens ↔ patches)
20
- - Causal masking self-attention with RoPE (relative positions)
21
- - RevIN (Reversible Instance Normalization) with causal statistics
22
- - SwiGLU feed-forward networks
23
- - Multi-quantile outputs (median + uncertainty bands)
24
- - Efficient rollout with KV caching
25
-
26
- ---
27
- tags:
28
- - timeseries
29
- - forecasting
30
- - transformer
31
- - patches
32
- - foundation
33
- - zero-shot
34
- pipeline_tag: time-series-forecasting
35
  ---
36
 
37
  This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
38
- - Code: [GitHub](https://github.com/vilhess/PatchFM)
39
- - Paper: Incoming
40
-
41
- ## Installation
42
- ```bash
43
- git clone https://github.com/vilhess/PatchFM
44
- cd PatchFM
45
- pip install -r requirements.txt
46
- ```
47
-
48
- ## Quick Start
49
-
50
- ```python
51
- import torch
52
- from model import Forecaster
53
- from configs import PatchFMConfig
54
-
55
- # --- Instantiate model ---
56
- config = PatchFMConfig(load_from _hub=True)
57
- model = Forecaster(config)
58
-
59
- # --- Inference ---
60
- forecast_horizon = 64
61
- seq = torch.randn(1, 1024) # (batch, time)
62
- pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9]) # (batch, time, quantiles)
63
- ```
64
-
65
- We provide an extended quick start example in [notebooks/tutorial.ipynb](./notebooks/tutorial.ipynb).
66
- If you dont have suitable hardware you can run the the extended quick start example example also in Google Colab:
67
-
68
- <a target="_blank" href="https://colab.research.google.com/drive/17sdf-7luCkv5TaeLj3Z6kIaTDkwkz3VR?usp=share_link">
69
- <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open Quick Start In Colab"/>
70
- </a>
71
-
72
- ## Method (TL;DR)
73
- - Patching: Split a context signal of length $w$ into $P_{num} = w / P_{len}$ patches of length $P_{len}$.
74
- - RevIN: Normalize patches using causal running mean/variance over past patches, and denormalize outputs to the original scale.
75
- - Architecture: Input residual MLP → stacked Transformer blocks (MHA + SwiGLU FFN, pre-norm, residual) → $|\mathcal{Q}|$ output heads mapping back to patch space.
76
- - Positional encoding: Rotary Position Embeddings (RoPE) applied to queries/keys.
77
- - Training: Multi-quantile (pinball) loss across positions, elements, and quantiles $\mathcal{Q}$.
78
- - Inference: Predict next patch; roll out autoregressively with KV caching for long horizons.
79
-
80
- ## Problem Formulation
81
- Given context patches $x_{p_1}, \ldots, x_{p_n}$, predict the next patch $x_{p_{i+1}}$ for each position $i$ using only past patches (causality). The model outputs quantiles $\{\hat{x}_{p_{i+1}}^{(q)}: q \in \mathcal{Q}\}$ with median (q=0.5) as the point forecast.
82
-
83
- ## Loss: Multi-Quantile (Pinball)
84
- For residual $u = x - \hat{x}^{(q)}$:
85
- $$\rho_q(u) = \begin{cases} q\,u, & u \ge 0,\\ (q-1)\,u, & u < 0. \end{cases}$$
86
- Aggregate over positions, patch elements, and quantiles.
87
-
88
- ## Architecture
89
- - Input MLP: $\mathbb{R}^{P_{len}} \to \mathbb{R}^{dim}$ residual 2-layer MLP (ReLU)
90
- - Multi-Head Attention: causal mask, RoPE; queries/keys/values per head
91
- - FFN: SwiGLU (SiLU-gated), pre-norm + residual
92
- - Output heads: |Q| linear maps $\mathbb{R}^{dim} \to \mathbb{R}^{P_{len}}$ (one per quantile)
93
-
94
- ### Model Details
95
- - Patch size: 32
96
- - Max context: 32 patches (1024 steps)
97
- - Forecast horizon: 32 steps per forward pass
98
- - Quantiles $\mathcal{Q}$: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}
99
- - Layers: 6
100
- - Attention heads: 64 (head dim 32)
101
- - Model dim: 2048
102
- - Parameters: ~300M
103
-
104
- ## Inference
105
- - Single step: predict next patch ($P_{len}$ values)
106
- - Long-horizon: append prediction to context and repeat (optionally drop oldest patch to keep window fixed)
107
- - KV caching: reuse cached keys/values for past patches; compute new Q/K/V only for the appended patch
108
-
109
- ## Datasets
110
- - UTSD (Unified Time Series Dataset) [UTSD]: seven domains (Energy, IoT, Nature, Web, Health, Transport, Environment). We start with UTSD-1G (~55M series after preprocessing).
111
- - Artificial: ~1M synthetic series (sinusoidal, linear, polynomial, logarithmic) plus mixtures via TSMixup [Chronos]; Gaussian Process samples via KernelSynth (mixtures of RBF/periodic/linear kernels with swept hyperparameters).
112
-
113
- ## Repository Layout
114
-
115
- - `model/training/` — main PatchFM model class
116
-
117
- - `modules.py` - core modules (Residual Layers, MHA, SwiGLU, RoPE, Transformer Encoder, ...)
118
- - `revin.py` — causal RevIN
119
- - `loss.py` — multi-quantile (pinball) loss
120
- - `trainer.py` — PyTorch Lightning trainer class
121
-
122
- - `model/inference/` — main PatchFM model class for inference with KV caching
123
- - `modules.py` — core modules with caching support
124
- - `forecaster.py` — Forecasting model with KV caching and rollout logic
125
-
126
- - `dataset/` — data loading and preprocessing
127
- - `artificial.py` — synthetic dataset : artificial signals + TSMixup + KernelSynth
128
- - `utsd.py` — Unified Time Series Dataset (UTSD) loading and preprocessing
129
- - `get_data.py` — utility to fetch and preprocess datasets
130
- - `generate_data.py` — utility to generate and save the KernelSynth dataset (long to generate)
131
-
132
- - `configs/` — model and training configurations
133
- - `notebooks/inference` — how to load a trained model and generate forecasts
134
- - `training.py` — training script using PyTorch Lightning
135
-
136
- ## Acknowledgements
137
- We thank the authors of the following repositories for inspiration and code snippets:
138
- - [TiRex](https://github.com/NX-AI/tirex)
 
1
  ---
 
 
 
 
2
  tags:
3
+ - model_hub_mixin
4
+ - pytorch_model_hub_mixin
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
6
 
7
  This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
+ - Code: [More Information Needed]
9
+ - Paper: [More Information Needed]
10
+ - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "ckpt_path": "ckpts/huge_v8_12g_5000.pth",
3
  "compile": true,
4
  "d_model": 2048,
5
  "load_from_hub": false,
 
1
  {
2
+ "ckpt_path": "ckpts/art_gift_utsd_tanh.pth",
3
  "compile": true,
4
  "d_model": 2048,
5
  "load_from_hub": false,
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8aa576f038e409ce4f620cd1f31d6d7b1f2f7f55c07bcdb2c569603dc4465bf2
3
  size 1275009880
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cdea543e9222806792864357e8fc5feac9d84a2798243db9218b4bee1e8d0c14
3
  size 1275009880