added task.name to uid generator by tiagofilipe12 · Pull Request #69 · bionode/bionode-watermill

tiagofilipe12 · 2017-07-27T10:37:41Z

In this PR I added a simple fix to a problem with duplicated uids when tasks are duplicated. Please check this journal entry since it has a detailed description on the issue related with this and on how I tried to solve it.

This duplication of uids not only prevented pipeline branches to run the same task, even if we declare different const for each task (with same contents), but also when branching with junction using same task it will execute with the first hit regardless of the branch it came from.

thejmazz · 2017-07-28T05:27:08Z

This is a simple PR but a very detailed write-up - give me a few days to understand the notebook entry and then I will merge!

PS. the dotted lines for not-yet-ran tasks visualization is amazing 💯

thejmazz · 2017-07-28T05:47:51Z

Why not use something else unique, like the time of task creation (or maintain an incrementation for each task generated in this store - root of store could even store a unique hash which represents state of graph at every time a task is added to it), to generate the uid? (that is, to include in addition to that hashing function). This seems to me like an solution to a problem by introducing manual effort that can otherwise be implemented via a stateful solution.

This bug is basically an oversight on my part having never ran a junction with that configuration, having never encountered that issue with fork.

It's cleaner than

const everythingForTaskButName = { /* input, output, props, operationCreator */ }
const [ task1, task2 ] = [ 'my-task', 'my-task' ]
  .reduce((sum, val, i) => sum.concat([ val + '_' + i]) , [ ])
// or something else over-engineered and difficult to read or even runs

const pipeline = junction(
  join(somethingA, addNameToTask(everythingButTaskName)(task1),
  join(somethingB, addNameToTask(everythingButTaskName)(task2))
)

Maybe we should be able to identify as well if a task with the exact same input, output, params has already ran (+ whatever else is required to be confident it is OK to use these existing results), and so can resolve from the output of that.

thejmazz · 2017-07-28T05:53:22Z

Maybe we even need to generate a uid after the task finishes - would it make sense for a "wants" uid, which many tasks can share, and also a "resolved" one which includes + resolvedOutput, using hashes of input contents + params, (so as to indicate there is no point running again). I say both because I imagine that a uid is still needed before any tasks run - but then one maybe one is needed after to verify what (the hash of output files combined with hash of output map), given which input (hash of params map + input contents).

In this way, one set of uids could generate a simple DAG for the general design of the pipeline, whereas the other includes all nodes for all executions of the pipeline (same consted task declarations) over multiple input samples.

thejmazz · 2017-07-28T05:57:05Z

All that being said - it is very interesting how what this PR proposes is quite similar to the method used with Nextflow to implement simple forking behaviour. The whole .into stuff

tiagofilipe12 · 2017-08-01T14:32:25Z

You are right, we should make this as automatic as possible rather than needing to duplicate tasks and add a name to it. Also, interesting fork seems to handle this already, since it does not have problems with dealing with the same tasks multiple times.

I will make a new PR to replace this (when it is ready) because there are some other fixes to orchestrators that need to be addressed.

tiagofilipe12 · 2017-08-04T14:57:43Z

This PR is obsolete because of #70.

added task.name to uid

d745d81

tiagofilipe12 added bug enhancement labels Jul 27, 2017

tiagofilipe12 self-assigned this Jul 27, 2017

tiagofilipe12 requested a review from thejmazz July 27, 2017 10:37

tiagofilipe12 mentioned this pull request Aug 4, 2017

Orchestrators refactoring and fork handling other orchestrators #70

Merged

tiagofilipe12 closed this Aug 4, 2017

tiagofilipe12 deleted the uid_fix branch August 17, 2017 10:53

bmpvieira removed the enhancement label Aug 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added task.name to uid generator#69

added task.name to uid generator#69
tiagofilipe12 wants to merge 1 commit intodevfrom
uid_fix

tiagofilipe12 commented Jul 27, 2017

Uh oh!

thejmazz commented Jul 28, 2017

Uh oh!

thejmazz commented Jul 28, 2017 •

edited

Loading

Uh oh!

thejmazz commented Jul 28, 2017

Uh oh!

thejmazz commented Jul 28, 2017

Uh oh!

tiagofilipe12 commented Aug 1, 2017

Uh oh!

tiagofilipe12 commented Aug 4, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tiagofilipe12 commented Jul 27, 2017

Uh oh!

thejmazz commented Jul 28, 2017

Uh oh!

thejmazz commented Jul 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thejmazz commented Jul 28, 2017

Uh oh!

thejmazz commented Jul 28, 2017

Uh oh!

tiagofilipe12 commented Aug 1, 2017

Uh oh!

tiagofilipe12 commented Aug 4, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thejmazz commented Jul 28, 2017 •

edited

Loading