Skip to content

Pipeline Graph (DAG) Visualization #50

@thejmazz

Description

@thejmazz

It is useful to have a visual representation of the Directed Acyclic Graph (DAG) that is produced during the execution of a pipeline.

In the graph,

  • each node is a task
  • a node may use outputs from n parent nodes as input(s). each input value will be resolved from one node. this information is not currently stored. it could be done as an alternative edge type (or perhaps use discrete edge weightings for different edge types)
  • in these visualizations, think of three vertical | as the child node of two parent nodes. TODO actual graph diagrams
  • join(A, B) creates the DAG a ---> b
  • junction(A, B) creates the DAG
a ---|
     |--->
b ---|
  • join(junction(A, B), C) creates the DAG
a ---|
     |---> c
b ---|
  • join(A, fork(X, Y), C) creates the DAG
      |---> x ---> c'
a --->|
      |---> y ---> c''

The redux reducer for the DAG is here. It uses graph.js.

The graph exists in the store under the path collection (i.e. a valid selector would be (state) => state.collection.

A function jsonifyGraph is also exported. This is because the graph object from graph.js is not serializable. This creates a serializable JSON representation of the graph.

See here how the collection (aka DAG) is logged out during task resolution for debug.

A first implementation of this could be to write the JSON graph to disk during the pipeline execution, overwriting the previous file whenever a ADD_OUTPUT or ADD_JUNCTION_VERTEX actions have been dispatched (i.e. whenever the state of the DAG changes). This way if a task fails, at least we have the last best graph stored.

Then it is a matter of parsing that JSON into a visualization using something like d3.

Suggestions to improve the way the graph is handled within watermill are welcome. Perhaps there is a better serializable format to use (e.g. graphml format).

BONUS

  • do it in a realtime with Electron/Browser app listening to changes in the redux store. new nodes should be added as the tasks run.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions