Spaces:
Running
Running
Upload misc documentation files
Browse files
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
ProjectResilience[[:space:]]Overview[[:space:]]LF\[27\].pdf filter=lfs diff=lfs merge=lfs -text
|
ProjectResilience Overview LF[27].pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d900140a203f124b7dd9c20494741d50b6ad7507cede6eddf6f58aafff2cbe3e
|
| 3 |
+
size 8608823
|
data_requirements.md
ADDED
|
@@ -0,0 +1,91 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Project Resilience Data Requirements and Tips
|
| 2 |
+
|
| 3 |
+
## Format and Features
|
| 4 |
+
|
| 5 |
+
Data features in each row of the data should include columns that can be cast
|
| 6 |
+
as **Context**, **Actions** and **Outcomes** of a decision pertaining to the
|
| 7 |
+
unit of decision-making.
|
| 8 |
+
|
| 9 |
+
For example, if the problem is carbon emissions decisions per power plant:
|
| 10 |
+
- the unit of decision making is a power plant, so each row should represent
|
| 11 |
+
a decision made for a power plant
|
| 12 |
+
- Context features are features about the plant that can't be changed
|
| 13 |
+
(e.g., location, weather, reactor type)
|
| 14 |
+
- Actions are policies for the plant that can be changed within reasonable
|
| 15 |
+
time so that the effect can be observed and associated to the action
|
| 16 |
+
(e.g., generator setup config, carbon capture level, change in generation
|
| 17 |
+
hours)
|
| 18 |
+
- Outcomes are quantifiable values that can be attributed to a single region
|
| 19 |
+
within a reasonable lag (e.g., carbon emissions, cost of actions, energy
|
| 20 |
+
- output)
|
| 21 |
+
|
| 22 |
+
## Predictability
|
| 23 |
+
|
| 24 |
+
We need some a priori theory of why/how Actions could affect Outcomes,
|
| 25 |
+
and why we should expect prediction of Outcomes to be easier from
|
| 26 |
+
Context/Actions rather than from a Context alone. A human being should,
|
| 27 |
+
just by looking at the context / action data, be able to predict more or less
|
| 28 |
+
what the outcome should be. At least be able to reason about it.
|
| 29 |
+
Alternatively, a basic predictor model mapping Context/Actions to Outcomes
|
| 30 |
+
should be able to show that it uses the Actions to make predictions better
|
| 31 |
+
than with Context alone. This simple predictor model does not need to use
|
| 32 |
+
the full data or input/output spaces, it just needs to make it clear that
|
| 33 |
+
there's something there.
|
| 34 |
+
|
| 35 |
+
## Rules of Thumb
|
| 36 |
+
|
| 37 |
+
### Time-series
|
| 38 |
+
|
| 39 |
+
Either (1) we have an outcome value at each time step, in which case the row
|
| 40 |
+
should indicate the time step; or (2) we have an outcome value only at
|
| 41 |
+
particular time steps (e.g., if we have daily power plant CO2 output,
|
| 42 |
+
but only monthly cost reports). In any case, if there are time steps which
|
| 43 |
+
are missing some values (context, action, or outcome), it's ok if they are
|
| 44 |
+
NA in the row for that time step: we can still construct time series to train
|
| 45 |
+
on from this dataset.
|
| 46 |
+
|
| 47 |
+
### Missing Data
|
| 48 |
+
|
| 49 |
+
To give the project the best chance of success, the amount of missing data
|
| 50 |
+
should be minimal and/or structured, e.g., we only get cost reports monthly.
|
| 51 |
+
|
| 52 |
+
### Data Sufficiency
|
| 53 |
+
|
| 54 |
+
Data rows should cover variations of decisions sufficiently, and so, in the
|
| 55 |
+
case of time-series data, we need historical decision instances that include
|
| 56 |
+
different actions taken for similar context.
|
| 57 |
+
|
| 58 |
+
A single row should represent one observation, which includes context,
|
| 59 |
+
actions, outcomes for that observation.
|
| 60 |
+
|
| 61 |
+
We need enough cases for our predictor to learn something about how Actions
|
| 62 |
+
affect Outcomes. If we have thousands of samples to begin with, that certainly
|
| 63 |
+
gives us a better shot. A quick-and-dirty check for correlations between
|
| 64 |
+
actions and outcomes could be used as a gating function, i.e., the
|
| 65 |
+
correlation matrix should not look like noise. If it looks like noise,
|
| 66 |
+
the project may be possible but hard.
|
| 67 |
+
|
| 68 |
+
Data requirement grows exponentially with number of outcome objectives.
|
| 69 |
+
|
| 70 |
+
### Consistency
|
| 71 |
+
|
| 72 |
+
Same context and actions should result in similar outcomes. Contradicting
|
| 73 |
+
samples should be minimal. In other words, not too many rows with same
|
| 74 |
+
Context and Actions resulting in different Outcomes.
|
| 75 |
+
|
| 76 |
+
It should be possible to observe the outcome of an action in a reasonable
|
| 77 |
+
amount of time (e.g., less than 3 months)
|
| 78 |
+
|
| 79 |
+
### Availability and Updates to the Data
|
| 80 |
+
|
| 81 |
+
As a rule of thumb, the number of new samples should be at least on the order
|
| 82 |
+
of the problem dimension, or (dim(A) + dim(C)) x (dim(O)). More important
|
| 83 |
+
than the number of new samples is which data is sampled: one sample in a
|
| 84 |
+
previously-unknown region of interest may be more useful than thousands
|
| 85 |
+
in a region we already know well or don't care about. So, if we control
|
| 86 |
+
which data is sampled, we don't need as much of it.
|
| 87 |
+
|
| 88 |
+
### Transparency/Accountability
|
| 89 |
+
|
| 90 |
+
Data should come from reliable, trusted, scientific, ethical sources.
|
| 91 |
+
(e.g. not blackboxes or your mom's Facebook surveys).
|
project_resilience_conceptual_architecture.pdf
ADDED
|
Binary file (179 kB). View file
|
|
|