Conversation
|
This is a simple PR but a very detailed write-up - give me a few days to understand the notebook entry and then I will merge! PS. the dotted lines for not-yet-ran tasks visualization is amazing 💯 |
|
Why not use something else unique, like the time of task creation (or maintain an incrementation for each task generated in this store - root of store could even store a unique hash which represents state of graph at every time a task is added to it), to generate the uid? (that is, to include in addition to that hashing function). This seems to me like an solution to a problem by introducing manual effort that can otherwise be implemented via a stateful solution. This bug is basically an oversight on my part having never ran a junction with that configuration, having never encountered that issue with fork. It's cleaner than const everythingForTaskButName = { /* input, output, props, operationCreator */ }
const [ task1, task2 ] = [ 'my-task', 'my-task' ]
.reduce((sum, val, i) => sum.concat([ val + '_' + i]) , [ ])
// or something else over-engineered and difficult to read or even runs
const pipeline = junction(
join(somethingA, addNameToTask(everythingButTaskName)(task1),
join(somethingB, addNameToTask(everythingButTaskName)(task2))
)Maybe we should be able to identify as well if a task with the exact same input, output, params has already ran (+ whatever else is required to be confident it is OK to use these existing results), and so can resolve from the output of that. |
|
Maybe we even need to generate a uid after the task finishes - would it make sense for a "wants" uid, which many tasks can share, and also a "resolved" one which includes In this way, one set of uids could generate a simple DAG for the general design of the pipeline, whereas the other includes all nodes for all executions of the pipeline (same |
|
All that being said - it is very interesting how what this PR proposes is quite similar to the method used with Nextflow to implement simple forking behaviour. The whole |
|
You are right, we should make this as automatic as possible rather than needing to duplicate tasks and add a name to it. Also, interesting I will make a new PR to replace this (when it is ready) because there are some other fixes to orchestrators that need to be addressed. |
|
This PR is obsolete because of #70. |
In this PR I added a simple fix to a problem with duplicated uids when tasks are duplicated. Please check this journal entry since it has a detailed description on the issue related with this and on how I tried to solve it.
This duplication of uids not only prevented pipeline branches to run the same task, even if we declare different
constfor each task (with same contents), but also when branching withjunctionusing same task it will execute with the first hit regardless of the branch it came from.