Spaces:

projectresilience
/

README

Running

App Files Files Community

danyoung commited on May 13, 2024

Commit

c10d1eb

verified ·

1 Parent(s): 2958dbe

Upload misc documentation files

Browse files

Files changed (4) hide show

.gitattributes +1 -0
ProjectResilience Overview LF[27].pdf +3 -0
data_requirements.md +91 -0
project_resilience_conceptual_architecture.pdf +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+ProjectResilience[[:space:]]Overview[[:space:]]LF\[27\].pdf filter=lfs diff=lfs merge=lfs -text

ProjectResilience Overview LF[27].pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d900140a203f124b7dd9c20494741d50b6ad7507cede6eddf6f58aafff2cbe3e
+size 8608823

data_requirements.md ADDED Viewed

	@@ -0,0 +1,91 @@

+# Project Resilience Data Requirements and Tips
+## Format and Features
+Data features in each row of the data should include columns that can be cast
+as **Context**, **Actions** and **Outcomes** of a decision pertaining to the
+unit of decision-making.
+For example, if the problem is carbon emissions decisions per power plant:
+- the unit of decision making is a power plant, so each row should represent
+a decision made for a power plant
+- Context features are features about the plant that can't be changed
+(e.g., location, weather, reactor type)
+- Actions are policies for the plant that can be changed within reasonable
+time so that the effect can be observed and associated to the action
+(e.g., generator setup config, carbon capture level, change in generation
+hours)
+- Outcomes are quantifiable values that can be attributed to a single region
+within a reasonable lag (e.g., carbon emissions, cost of actions, energy
+- output)
+## Predictability
+We need some a priori theory of why/how Actions could affect Outcomes,
+and why we should expect prediction of Outcomes to be easier from
+Context/Actions rather than from a Context alone. A human being should,
+just by looking at the context / action data, be able to predict more or less
+what the outcome should be. At least be able to reason about it.
+Alternatively, a basic predictor model mapping Context/Actions to Outcomes
+should be able to show that it uses the Actions to make predictions better
+than with Context alone. This simple predictor model does not need to use
+the full data or input/output spaces, it just needs to make it clear that
+there's something there.
+## Rules of Thumb
+### Time-series
+Either (1) we have an outcome value at each time step, in which case the row
+should indicate the time step; or (2) we have an outcome value only at
+particular time steps (e.g., if we have daily power plant CO2 output,
+but only monthly cost reports). In any case, if there are time steps which
+are missing some values (context, action, or outcome), it's ok if they are
+NA in the row for that time step: we can still construct time series to train
+on from this dataset.
+### Missing Data
+To give the project the best chance of success, the amount of missing data
+should be minimal and/or structured, e.g., we only get cost reports monthly.
+### Data Sufficiency
+Data rows should cover variations of decisions sufficiently, and so, in the
+case of time-series data, we need historical decision instances that include
+different actions taken for similar context.
+A single row should represent one observation, which includes context,
+actions, outcomes for that observation.
+We need enough cases for our predictor to learn something about how Actions
+affect Outcomes. If we have thousands of samples to begin with, that certainly
+gives us a better shot. A quick-and-dirty check for correlations between
+actions and outcomes could be used as a gating function, i.e., the
+correlation matrix should not look like noise. If it looks like noise,
+the project may be possible but hard.
+Data requirement grows exponentially with number of outcome objectives.
+### Consistency
+Same context and actions should result in similar outcomes. Contradicting
+samples should be minimal. In other words, not too many rows with same
+Context and Actions resulting in different Outcomes.
+It should be possible to observe the outcome of an action in a reasonable
+amount of time (e.g., less than 3 months)
+### Availability and Updates to the Data
+As a rule of thumb, the number of new samples should be at least on the order
+of the problem dimension, or (dim(A) + dim(C)) x (dim(O)). More important
+than the number of new samples is which data is sampled: one sample in a
+previously-unknown region of interest may be more useful than thousands
+in a region we already know well or don't care about. So, if we control
+which data is sampled, we don't need as much of it.
+### Transparency/Accountability
+Data should come from reliable, trusted, scientific, ethical sources.
+(e.g. not blackboxes or your mom's Facebook surveys).

project_resilience_conceptual_architecture.pdf ADDED Viewed

Binary file (179 kB). View file