BI-2115 - Experiment Import Workflows Refactor#352
Merged
Conversation
…eeding-Insight/bi-api into experiment-processor-refactor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Story: Please include a link to the story in Jira
This story is focused on refactoring the experiment processor without any sub obs unit work other than defining a workflow to be used. Sub observation unit work should build off of this.
Design
Background
The team has recognized the need to refactor the ExperimentProcessor and it seems it would be best to address that in this PR instead of contributing more complexity to an already hard to follow implementation. Some of the main issues are a lack of modularity, hard to follow shared state, and function side effects. The goal of the refactor is to encapsulate logic into modular units and reduce shared state by passing immutable inputs and outputs.
The existing Processor interface and ProcessorManager are contributing to the need to use shared state between processing steps. It may also force sub-optimal application logic if we were to use a database DAO implementation rather than BrAPI. I am proposing removing the ProcessorManager and having a single process entry point into the processor. I think the
getExistingBrapiDatain the Processor interface isn't necessary and processor implementations can choose how to process from the process entry point. It was primarily in there to provide a status update but I really don't think that status update is necessary and if it is, it can still be supported with an alternate implementation.Workflows
The Experiment file import has a number of possible workflows depending on the data in the file being imported. Currently all of the logic for these different workflows is all together in one unit and differentiated via conditionals. Each of these import workflows can be encapsulated in their own workflow and the logic can be separated so that only the logic relevant to that workflow is present. This can be modeled using the Strategy design pattern. The two obvious workflows from the current implementation are when you have ObsUnitIDs in the import file vs not. We can define a workflow for each case, we'll call the no ObsUnitIDs case Create New Experiment and the ObsUnitID case Append Overwrite Phenotypes. We could also potentially split off another workflow from the Create New Experiment workflow called Create New Environment. The same thing could potentially be done with sub observation units from the Append Overwrite Phenotypes workflow. The idea would be to create composable steps that could be reused in different workflows.
Pipelines
Each workflow would be compose a series of steps in a processing pipeline. Each step would be a modular part of the processing chain that would take an input, and provide an output to the next step. Modeling in this way should help to keep the architecture more modular in the future and allows for status to be reported at each step using an observer.
classDiagram class ProcessingStep~I, O~ { +process(input: I): O } class InitializeProcessorStep { +process(importRows: List<BrAPIImport>): List<BrAPIImport> } class GetExistingBrAPIDataStep { -program: Program +GetExistingBrAPIDataStep(program: Program) +process(importRows: List<BrAPIImport>): Map<String, PendingImportObject<?>> } class ProcessedData { -newData: Map<String, PendingImportObject<?>> -validationErrors: ValidationErrors -statistics: Map<String, ImportPreviewStatistics> } class Pipeline~I, O~ { -steps: List<ProcessingStep<?, ?>> -observers: List<PipelineObserver> +addStep<T>(step: ProcessingStep<T, ?>): void +addObserver(observer: PipelineObserver): void +execute(input: I): O } class ImportContext { -upload: ImportUpload -importRows: List<BrAPIImport> -mappedBrAPIImport: Map<Integer, PendingImport> -data: Table -program: Program -user: User -commit: boolean } class ExperimentProcessor { +ExperimentProcessor() +process(context: ImportContext): Map<String, ImportPreviewStatistics> } class Workflow { <<interface>> +process(input: ImportContext): ProcessedData } class ExperimentWorkflowFactory { +getWorkflow(context: ImportContext): Workflow } class CreateNewExperimentWorkflow { -pipeline: Pipeline<ImportContext, ProcessedData> +CreateNewExperimentWorkflow() +process(input: ImportContext): ProcessedData } class AppendOverwritePhenotypesWorkflow { -pipeline: Pipeline<ImportContext, ProcessedData> +AppendOverwritePhenotypesWorkflow() +process(input: ImportContext): ProcessedData } class PipelineObserver { <<interface>> +onStepStart(step: ProcessingStep<?, ?>, input: any): void +onStepEnd(step: ProcessingStep<?, ?>, output: any): void +onPipelineStart(input: any): void +onPipelineEnd(output: any): void } class StatusReportingPipelineObserver { -stepStartMessages: Map<Class<? extends ProcessingStep>, String> -stepEndMessages: Map<Class<? extends ProcessingStep>, String> -pipelineStartMessage: String -pipelineEndMessage: String +StatusReportingPipelineObserver(stepStartMessages: Map<Class<? extends ProcessingStep>, String>, stepEndMessages: Map<Class<? extends ProcessingStep>, String>, pipelineStartMessage: String, pipelineEndMessage: String) +onStepStart(step: ProcessingStep<?, ?>, input: any): void +onStepEnd(step: ProcessingStep<?, ?>, output: any): void +onPipelineStart(input: any): void +onPipelineEnd(output: any): void } ProcessingStep <|.. InitializeProcessorStep ProcessingStep <|.. GetExistingBrAPIDataStep Pipeline "1" *-- "many" ProcessingStep Pipeline "1" o-- "many" PipelineObserver ExperimentProcessor "1" o-- "1" Workflow ExperimentProcessor ..> ExperimentWorkflowFactory : uses ExperimentProcessor ..> ImportContext : uses Workflow <|.. AppendOverwritePhenotypesWorkflow Workflow <|.. CreateNewExperimentWorkflow Workflow ..> ProcessedData : uses CreateNewExperimentWorkflow "1" *-- "1" Pipeline AppendOverwritePhenotypesWorkflow "1" *-- "1" Pipeline PipelineObserver <|.. StatusReportingPipelineObserver ExperimentWorkflowFactory ..> Workflow : createsMermaid Live Editor UML source
Dependencies
Please include any dependencies to other code branches, testing configurations, scripts to be run, etc.
Testing
Please include any details needed for reviewers to test this code
Checklist: