Skip to content

BI-2115 - Experiment Import Workflows Refactor#352

Merged
dmeidlin merged 63 commits intodevelopfrom
experiment-processor-refactor
Jul 11, 2024
Merged

BI-2115 - Experiment Import Workflows Refactor#352
dmeidlin merged 63 commits intodevelopfrom
experiment-processor-refactor

Conversation

@nickpalladino
Copy link
Member

@nickpalladino nickpalladino commented Apr 26, 2024

Description

Story: Please include a link to the story in Jira

This story is focused on refactoring the experiment processor without any sub obs unit work other than defining a workflow to be used. Sub observation unit work should build off of this.

Design

Background

The team has recognized the need to refactor the ExperimentProcessor and it seems it would be best to address that in this PR instead of contributing more complexity to an already hard to follow implementation. Some of the main issues are a lack of modularity, hard to follow shared state, and function side effects. The goal of the refactor is to encapsulate logic into modular units and reduce shared state by passing immutable inputs and outputs.

The existing Processor interface and ProcessorManager are contributing to the need to use shared state between processing steps. It may also force sub-optimal application logic if we were to use a database DAO implementation rather than BrAPI. I am proposing removing the ProcessorManager and having a single process entry point into the processor. I think the getExistingBrapiData in the Processor interface isn't necessary and processor implementations can choose how to process from the process entry point. It was primarily in there to provide a status update but I really don't think that status update is necessary and if it is, it can still be supported with an alternate implementation.

Workflows

The Experiment file import has a number of possible workflows depending on the data in the file being imported. Currently all of the logic for these different workflows is all together in one unit and differentiated via conditionals. Each of these import workflows can be encapsulated in their own workflow and the logic can be separated so that only the logic relevant to that workflow is present. This can be modeled using the Strategy design pattern. The two obvious workflows from the current implementation are when you have ObsUnitIDs in the import file vs not. We can define a workflow for each case, we'll call the no ObsUnitIDs case Create New Experiment and the ObsUnitID case Append Overwrite Phenotypes. We could also potentially split off another workflow from the Create New Experiment workflow called Create New Environment. The same thing could potentially be done with sub observation units from the Append Overwrite Phenotypes workflow. The idea would be to create composable steps that could be reused in different workflows.

Pipelines

Each workflow would be compose a series of steps in a processing pipeline. Each step would be a modular part of the processing chain that would take an input, and provide an output to the next step. Modeling in this way should help to keep the architecture more modular in the future and allows for status to be reported at each step using an observer.

classDiagram
    class ProcessingStep~I, O~ {
        +process(input: I): O
    }

    class InitializeProcessorStep {
        +process(importRows: List<BrAPIImport>): List<BrAPIImport>
    }

    class GetExistingBrAPIDataStep {
        -program: Program
        +GetExistingBrAPIDataStep(program: Program)
        +process(importRows: List<BrAPIImport>): Map<String, PendingImportObject<?>>
    }

    class ProcessedData {
        -newData: Map<String, PendingImportObject<?>>
        -validationErrors: ValidationErrors
        -statistics: Map<String, ImportPreviewStatistics>
    }

    class Pipeline~I, O~ {
        -steps: List<ProcessingStep<?, ?>>
        -observers: List<PipelineObserver>
        +addStep<T>(step: ProcessingStep<T, ?>): void
        +addObserver(observer: PipelineObserver): void
        +execute(input: I): O
    }

       class ImportContext {
        -upload: ImportUpload
        -importRows: List<BrAPIImport>
        -mappedBrAPIImport: Map<Integer, PendingImport>
        -data: Table
        -program: Program
        -user: User
        -commit: boolean
    }

    class ExperimentProcessor {
        +ExperimentProcessor()
        +process(context: ImportContext): Map<String, ImportPreviewStatistics>
    }

    class Workflow {
        <<interface>>
        +process(input: ImportContext): ProcessedData
    }

    class ExperimentWorkflowFactory {
        +getWorkflow(context: ImportContext): Workflow
    }

    class CreateNewExperimentWorkflow {
        -pipeline: Pipeline<ImportContext, ProcessedData>
        +CreateNewExperimentWorkflow()
        +process(input: ImportContext): ProcessedData
    }

    class AppendOverwritePhenotypesWorkflow {
        -pipeline: Pipeline<ImportContext, ProcessedData>
        +AppendOverwritePhenotypesWorkflow()
        +process(input: ImportContext): ProcessedData
    }

    class PipelineObserver {
        <<interface>>
        +onStepStart(step: ProcessingStep<?, ?>, input: any): void
        +onStepEnd(step: ProcessingStep<?, ?>, output: any): void
        +onPipelineStart(input: any): void
        +onPipelineEnd(output: any): void
    }

    class StatusReportingPipelineObserver {
        -stepStartMessages: Map<Class<? extends ProcessingStep>, String>
        -stepEndMessages: Map<Class<? extends ProcessingStep>, String>
        -pipelineStartMessage: String
        -pipelineEndMessage: String
        +StatusReportingPipelineObserver(stepStartMessages: Map<Class<? extends ProcessingStep>, String>, stepEndMessages: Map<Class<? extends ProcessingStep>, String>, pipelineStartMessage: String, pipelineEndMessage: String)
        +onStepStart(step: ProcessingStep<?, ?>, input: any): void
        +onStepEnd(step: ProcessingStep<?, ?>, output: any): void
        +onPipelineStart(input: any): void
        +onPipelineEnd(output: any): void
    }

    ProcessingStep <|.. InitializeProcessorStep
    ProcessingStep <|.. GetExistingBrAPIDataStep

    Pipeline "1" *-- "many" ProcessingStep
    Pipeline "1" o-- "many" PipelineObserver

    ExperimentProcessor "1" o-- "1" Workflow

    ExperimentProcessor ..> ExperimentWorkflowFactory : uses
    ExperimentProcessor ..> ImportContext : uses

    Workflow <|.. AppendOverwritePhenotypesWorkflow
    Workflow <|.. CreateNewExperimentWorkflow
    Workflow ..> ProcessedData : uses

    CreateNewExperimentWorkflow "1" *-- "1" Pipeline
    AppendOverwritePhenotypesWorkflow "1" *-- "1" Pipeline

    PipelineObserver <|.. StatusReportingPipelineObserver

    ExperimentWorkflowFactory ..> Workflow : creates
Loading

Mermaid Live Editor UML source

Dependencies

Please include any dependencies to other code branches, testing configurations, scripts to be run, etc.

Testing

Please include any details needed for reviewers to test this code

Checklist:

  • I have performed a self-review of my own code
  • I have tested my code and ensured it meets the acceptance criteria of the story
  • I have tested that my code works with both the brapi-java-server and BreedBase
  • I have create/modified unit tests to cover this change
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to documentation
  • I have run TAF: <please include a link to TAF run>

@nickpalladino nickpalladino changed the title Experiment Processor Refactor BI-2115 - Experiment Import Workflows Refactor Apr 29, 2024
@dmeidlin dmeidlin merged commit 9141e9b into develop Jul 11, 2024
@dmeidlin dmeidlin deleted the experiment-processor-refactor branch July 11, 2024 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants