Data Model for Shared STATE (supports model modularity) RFC

Currently, `hsp2` features domain-specific (i.e. a single `RCHERS`, `PERLND`, etc) simulation and data access, with with only the current segment data, i.e. the `ts` Dict, parameters, and internal calculation state loaded, passed to the functional routine.   To support the legacy SPEC-ACTIONS (#90) and potential new enhanced model modularity features, a robust data model is needed to facilitate passing `STATE` across multiple domains (segments) and amongst the various functions.  This issue attempts to outline a proposed data structure schema to facilitate this, including performance considerations, and functional considerations.  (see also: #126 )

### Goals
- Sharing `STATE` should allow calculations to be done on any publicly modifiable state variable (individual code/model domains must opt-in to `STATE` sharing).
- `STATE` can be shared across spatial segment (i.e., from RCHRES1 to RCHRES2, etc)
- `STATE` can be shared across model functional (operation) domains (i.e. HYDR, PERLND, PQUAL, GQUAL, ...) 
   - Note: term `domain` is used herein to describe an operation + segment as a unit
- `STATE` schema should discourage name collisions
- Schema should be high performance (i.e., compatible with `numba/@njit` 
- Modules should be able to opt-in for state sharing, i.e., just because something is set in the shared `STATE` data structure does not guarantee that it modifies model behavior.

### Benefits
- STATE dependency ordering integrates well with Special Actions type behavior, as STATE is designed based on the nodular paradigm introduced  by MASSLINKS and SPECACT.
- Because STATE provides cross-domain data access, implementing modules can reduce data access and argument overhead.
- STATE components/primitives can enable very concise runtime code chunks.
- Execution tracing through special actions and other operations in STATE system is simplified.
- Because STATE functions/objects are dependency ordered, code amendments are possible in a highly modular fashion, and labyrinthine code blocks can be reduced.
- STATE primitives can be optimized, and thus, when meta-functions are assembled out of STATE primitives, code optimization is persistent.


### Draft Data Model
Integer keyed STATE
   - `state_paths` is 1-d, string keyed, paths based on the hierarchical paths stored in the `hdf5`. Its keys are full path to hdf5 `STATE`, and its values point to the integer key for use in all other runtime Dicts. The key is generated automatically at the beginning of model loading and is static throughout the simulation.
      - Path Use: `/STATE/[h5 file base name]/[segment type]_[index]/[variable name]`
      - Ex: 
         - The `STATE` for `RCHRES 1`, `IVOL` in the file `test10.h5` (`test10.uci`) has an hdf5-style path of `/STATE/test10/RCHRES_R001/IVOL`
         - `/STATE/test10/RCHRES_R001/IVOL` is the 25th item added to the `STATE` during model loading, and thus it's integer index is `25`
         - This integer index for this `/STATE/test10/RCHRES_R001/IVOL` is found in state_paths, such that `state_paths['/STATE/test10/RCHRES_R001/IVOL'] = 25`
   - `state_ix`: Dict that holds scalar numeric state values. It is an integer keyed, version of `hdf5`. 
      - Ex: 
         - At end of a given timestep `RCHRES 1`, `IVOL`  state value is `1803.5`
         - The integer *index* value for `IVOL` in `RCHRES 1` variable is 25
         - Therefore `state_ix[25] = 1803.5`
   - `dict_ix` is integer keyed Dict to store values of array/matrix objects.
   - `ts_ix` - Dict of timeseries data (TBD: may be redundant to `dict_ix`, given that all ts data can be keyed via an hdf5 path)

```py
state_paths = Dict.empty(key_type=types.unicode_type, value_type=types.int64)
state_ix = Dict.empty(key_type=types.int64, value_type=types.float64)
dict_ix = Dict.empty(key_type=types.int64, value_type=types.float64[:,:])
ts_ix = Dict.empty(key_type=types.int64, value_type=types.float64[:])
```

### Concepts
- **Data Namespace** Implement hdf5 path type notation is ideal schema for cross-domain sharing & compatibility with new `hsp2` adoption of hdf5
   - Discourages name collisions
   - Ex: Inflow to RCHRES 1 at any timestep is stored in `/STATE/test10/RCHRES_R001/IVOL`
   - Ex: Inflow to RCHRES 2 at any timestep is stored in `/STATE/test10/RCHRES_R002/IVOL`
- **Coupling** Tightly coupled supporting modules should write to state at the end of every timestep, and read from state after executing coupled operational models.
   - Current `hsp2` model structure executes all timesteps for each spatial domain at one time
   - Tight coupling would mean all spatial domains should execute in sequence at a given timestep
   - Complete coupling would mean all spatial domains and functional domains would execute using "dependency ordered" method.
   - This will require substantial rewrite of `hsp2` routines, however, base functions are fairly amenable to extracting timestep loop code.
- **Performance** 
   - Early testing of using character-keyed `STATE` Dict showed a large performance hit over using integer-keyed `STATE` Dict.  This may be due to the way in which the character-keyed Dict was set up.  _Testing code to be included in this issue soon._

### Implementation
- `numba` compatibility requires separate storage for
   - Scalar values
   - Vector/Array-type values
- Integer indexed `Dict` may perform faster than Character indexed `Dict`, therefore, this should be optimized. (see example below).   
   - Character keyed Dicts described below in ***Data Structure Option 1**
   - Integer keyed Dicts described below in ***Data Structure Option 2**
- Implemented with Data Structure Option 2 (see below)
- See also module https://github.com/respec/HSPsquared/issues/126

#### Data Structure Option 1 - Character keyed STATE
- **Data Structures**
   - `state_ix`: Holds numeric state values. It is a single-dimension, hdf5 path keyed Dict. 
   - `dict_ix` Holds values of array/matrix objects. It is multi-dimensional hdf5 path keyed Dict.
   - `ts_ix` - Holds values of timeseries dat, a (TBD: may be redundant to `dict_ix`, given that all ts data can be keyed via an hdf5 path)

```py
state_ix = Dict.empty(key_type=types.int64, value_type=types.float64)
dict_ix = Dict.empty(key_type=types.int64, value_type=types.float64[:,:])
ts_ix = Dict.empty(key_type=types.int64, value_type=types.float64[:])
```

#### Data Structure Option 2 
See above **Draft Data Model**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data Model for Shared STATE (supports model modularity) RFC #112

Goals

Benefits

Draft Data Model

Concepts

Implementation

Data Structure Option 1 - Character keyed STATE

Data Structure Option 2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Data Model for Shared STATE (supports model modularity) RFC #112

Description

Goals

Benefits

Draft Data Model

Concepts

Implementation

Data Structure Option 1 - Character keyed STATE

Data Structure Option 2

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions