Skip to content

BI-2156-qa - Append Workflow Edge Cases#442

Merged
mlm483 merged 9 commits intodevelopfrom
BI-2156-qa
Feb 19, 2025
Merged

BI-2156-qa - Append Workflow Edge Cases#442
mlm483 merged 9 commits intodevelopfrom
BI-2156-qa

Conversation

@mlm483
Copy link
Contributor

@mlm483 mlm483 commented Jan 29, 2025

Description

Story: https://breedinginsight.atlassian.net/browse/BI-2156 (These edge cases were discovered in QA while testing BI-2156, but they have nothing to do with BI-2156.)

Current Behavior

When using the "Append experimental dataset" workflow (a) with a file that contains non-existent ObsUnitIds, and (b) with a file that contains valid ObsUnitIds but no Ontology Terms or Observation Data, generic errors were being shown to the user.

Desired Behavior

Ideally (a) should result in an useful error message and (b) shouldn’t result in an error, but selecting commit shouldn’t change anything in the experiment.

Changes

For (a) the main change was creating a better error message, INVALID_OBS_UNIT_ID_ERROR in ExperimentUtilities.java, the other changes (across many several files) were to propagate the exception correctly so that a 4xx status is returned rather than a 5xx status.

For (b) (commit e52481f) I handled a null case to prevent a NullPointerException in AppendOverwritePhenotypesWorkflow.java at this line.

Testing

(a)

  1. Download an experiment from a different environment (e.g. qa-test) or different program, or create a new experiment file with non-existent ObsUnitIds.
  2. Try to upload this file using the "Append" workflow.
  3. You should see a useful error indicating the ObsUnitIds were not found.

(b)

  1. Upload an experiment with phenotypic data columns.
  2. Download that experiment, and delete the phenotypic data columns from the file.
  3. Re-upload the edited file using the "Append" workflow.
  4. The upload should succeed without error and the experiment data should not be changed.

Checklist:

  • I have performed a self-review of my own code
  • I have tested my code and ensured it meets the acceptance criteria of the story
  • I have tested that my code works with both the brapi-java-server and BreedBase
  • I have create/modified unit tests to cover this change
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to documentation
  • I have run TAF: <please include a link to TAF run>

fixed case when file contains valid ObsUnitIds but no phenotypic data columns
}
@Override
public List<ImportWorkflow> getWorkflows() {
public List<ImportWorkflow> getWorkflows() throws UnprocessableEntityException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some code in this method that pulls an exception out of the workflow context and throws it (ImportWorkflowResult::getCaughtException). I think that was the intention for the append workflow and how exceptions are expected to be propagated. There are places in the processing code that use ImportWorkflowResult::setCaughtException

Copy link
Contributor

@dmeidlin dmeidlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting and setting caught exceptions in the ImportWorkflowResult is deliberate and necessary for implementing a SAGA design pattern for handling transactions atomically. Changing the error messages and status codes is fine, but do not change the error handling to simply halt execution and propagate exceptions up the layers to be handled. This short-circuits the SAGA pattern and prevents compensating actions from being performed.

@mlm483
Copy link
Contributor Author

mlm483 commented Jan 30, 2025

Getting and setting caught exceptions in the ImportWorkflowResult is deliberate and necessary for implementing a SAGA design pattern for handling transactions atomically. Changing the error messages and status codes is fine, but do not change the error handling to simply halt execution and propagate exceptions up the layers to be handled. This short-circuits the SAGA pattern and prevents compensating actions from being performed.

I understand. I tried having PendingObservationUnit::brapiRead call setProcessError and then return an empty list in the case where invalid ObsUnitIds are supplied, but that's resulting in a NPE exception at a later step in the workflow. I'll see if I can get the workflow to run to completion anyways, it may be that other steps in the workflow could handle null or empty input more gracefully.

@mlm483
Copy link
Contributor Author

mlm483 commented Feb 4, 2025

After discussions with Nick, Dave, and Shawn, I've made several changes.

  1. Switched to using ValidationErrors to build a detailed tabular error which the frontend will display to the user.
  2. Created MissingValuesException as a way to efficiently pass a set of strings up the call chain.
  3. Updated the error message to match Shawn's documentation, https://cornell.box.com/s/67i3q3d3jkw5zenxngqufm4ft0boi3mf.

@dmeidlin
Copy link
Contributor

dmeidlin commented Feb 7, 2025

Caught exceptions are handled in the middleware instead of the lower layers, so just pass the exception through PendingObservationUnit#brapiRead and move the ValidationError handler up to the middleware.

@dmeidlin
Copy link
Contributor

dmeidlin commented Feb 7, 2025

The first middleware step in the appendOverwrite workflow is AppendOverwriteIDValidation.java which currently checks for empty values or duplicated values in the ObsUnitID column, throwing an exception if any are found which appears as a banner message on the front end. Validating that every valid ObSUnitID value is also found in the BrAPI service should be done in this step. This involves simply moving this line from WorkflowInitialization.java to AppendOverwriteIDValidation.java, and placing the ValidationError handler here.

@dmeidlin
Copy link
Contributor

dmeidlin commented Feb 7, 2025

MissingValuesExcepton is a confusing name because there is already a check for missing ObsUnitID values being done, and this error type is not used for that case. The error type is really for the case where there isn't a BrAPI entity stored for the given id, so a different name like EntityNotFoundException would be appropriate. Also, instead of placing the class for this exception type at the root of the experiment package, experiment/model seems like a better place for it.

@dmeidlin
Copy link
Contributor

dmeidlin commented Feb 7, 2025

ExperimentUtilities#collateReferenceOUIds is the method currently called to both validate against missing or duplicated ids and collect all the ids if all rows pass validation. This method should be split into two methods, ValidateReferenceOUIdValues and collateUniqueOUIds. This way validation errors can be generated for missing or duplicated values and displayed in the preview table instead of just a banner message. Any remaining unique OU ids can then be collected and corresponding Obs Units initialized. if an EntityNotFoundException is thrown then the exception propagates up to the AppendOverwriteIDValidation middleware where validation errors are built and added to the others by calling ExperimentUtilities#addValidationErrorsForObsUnitsNotFound, which contains your handler logic.

@dmeidlin
Copy link
Contributor

dmeidlin commented Feb 7, 2025

There's a bug in ObservationUnitService#getObservationUnitsByDbId. This method is fetching the OUs for a collection of Deltabreed ids, not for a collection of BrAPI Service DbIds. This method needs to search the xrefs, not the dbIds, of the OUs returned from the service. The method should also be renamed to ObservationUnitService#getObservationUnitsById.

@dmeidlin
Copy link
Contributor

dmeidlin commented Feb 7, 2025

I spoke with @shawnyarnes about the error messaging for the case of missing or duplicated ObsUnitID values and the source of truth has been updated and should be used for the validation errors.

@dmeidlin
Copy link
Contributor

dmeidlin commented Feb 7, 2025

There are a couple of problems with setting all summary statistics counts to zero in ImportTableProcess here.
The edge case being addressed is where an experiment exists without a dataset defined (no observation variable, no observation data) and the user selects the append workflow but imports a sheet with no new data (still no obs vars and no data). There isn't a check in place for this edge case, and the action taken is not correct for this edge case. The experiment still has germplasm and study counts that should appear in the summary preview, but are incorrectly being displayed as zero everywhere.
undefinedDatasetNoSummary
The expected summary for this edge case should look like the image below, instead.
undefinedDatasetWithSummary

Each import row and each cell of observation data can be in one of many possible states that requires its own set of validation steps, pending object processing, and statistics calculation, and there is a ProcessedDataFactory for creating instances of each data class containing the specific methods for validation, processing, statistics, etc., so that there isn't a confusing morass of nested if-else blocks and switch cases in the main import table process method. Instead of tacking on a bit of code to ImportTableProcess#process that sets the statistics to empty values(which isn't correct), you'll need to create a new class appendoverwrite/factory/data/UndefinedDataset.java and update ProcessedDataFactory to produce beans of that type.

Then in the import process method just before looping over any observation variables defined add a check for this edge case where there are no observation variables and set processedData to a new instance of UndefinedDatasetBean using the factory.

Then have processedData call the same methods to update statistics as is used when processing rows with observation variables. It would be good to place these lines in a helper method updatePreviewStatistics.

public static final String UNMATCHED_COLUMN = "Ontology term(s) not found: ";


public static final String INVALID_OBS_UNIT_ID_ERROR = "Invalid ObsUnitID";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message for empty ids is the same but duplicate ids have a more specific error message. Also, constants specific to the append workflow are here.

Suggested change
public static final String INVALID_OBS_UNIT_ID_ERROR = "Invalid ObsUnitID";
public enum ErrMessage {
MULTIPLE_EXP_TITLES("File contains more than one Experiment Title"),
MISSING_OBS_UNIT_ID("Invalid ObsUnitID"),
PREEXISTING_EXPERIMENT_TITLE("Experiment Title already exists"),
UNMATCHED_COLUMN("Ontology term(s) not found: "),
OBS_UNIT_NOT_FOUND("Invalid ObsUnitID"),
DUPLICATE_OBS_UNIT_ID("ObsUnitId is repeated");

Comment on lines +1 to +12
package org.breedinginsight.brapps.importer.services.processors.experiment;

import lombok.Getter;

import java.util.Set;

@Getter
public class MissingValuesException extends Exception {
private Set<String> missingIds;

public MissingValuesException(Set<String> missingIds) { this.missingIds = missingIds; }
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
package org.breedinginsight.brapps.importer.services.processors.experiment;
import lombok.Getter;
import java.util.Set;
@Getter
public class MissingValuesException extends Exception {
private Set<String> missingIds;
public MissingValuesException(Set<String> missingIds) { this.missingIds = missingIds; }
}
package org.breedinginsight.brapps.importer.services.processors.experiment.model;
import lombok.Getter;
import java.util.Set;
@Getter
public class EntityNotFoundException extends Throwable {
private Set<String> missingEntityIds;
public EntityNotFoundException(Set<String> missingEntityIds) { this.missingEntityIds = missingEntityIds; }
}

* @throws ApiException if an error occurs during the execution of the action.
*/
Optional<BrAPIState<T>> execute() throws ApiException, MissingRequiredInfoException, UnprocessableEntityException, DoesNotExistException;
Optional<BrAPIState<T>> execute() throws ApiException, MissingRequiredInfoException, UnprocessableEntityException, DoesNotExistException, ValidatorException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Optional<BrAPIState<T>> execute() throws ApiException, MissingRequiredInfoException, UnprocessableEntityException, DoesNotExistException, ValidatorException;
Optional<BrAPIState<T>> execute() throws ApiException, MissingRequiredInfoException, UnprocessableEntityException, DoesNotExistException, EntityNotFoundException;

* @throws ApiException if an error occurs during execution
*/
public Optional<BrAPIState<T>> execute() throws ApiException {
public Optional<BrAPIState<T>> execute() throws ApiException, ValidatorException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public Optional<BrAPIState<T>> execute() throws ApiException, ValidatorException {
public Optional<BrAPIState<T>> execute() throws ApiException, EntityNotFoundException {

* @throws ApiException if there is an issue with the API call
*/
public List<T> brapiRead() throws ApiException;
public List<T> brapiRead() throws ApiException, ValidatorException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public List<T> brapiRead() throws ApiException, ValidatorException;
public List<T> brapiRead() throws ApiException, EntityNotFoundException;

Comment on lines +105 to +120
try {
// For each id fetch the observation unit from the brapi data store
return observationUnitService.getObservationUnitsByDbId(new HashSet<>(obsUnitIds), importContext.getProgram());
}
catch (MissingValuesException e) {
ValidationErrors validationErrors = new ValidationErrors();

// Build a detailed tabular error.
for (int rowNum = 0; rowNum < importContext.getImportRows().size(); rowNum++) {
String rowObsUnitId = ((ExperimentObservation)importContext.getImportRows().get(rowNum)).getObsUnitID();
if (e.getMissingIds().contains(rowObsUnitId)) {
ExperimentUtilities.addRowError(ExperimentObservation.Columns.OBS_UNIT_ID, ExperimentUtilities.INVALID_OBS_UNIT_ID_ERROR, validationErrors, rowNum);
}
}
throw new ValidatorException(validationErrors);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
try {
// For each id fetch the observation unit from the brapi data store
return observationUnitService.getObservationUnitsByDbId(new HashSet<>(obsUnitIds), importContext.getProgram());
}
catch (MissingValuesException e) {
ValidationErrors validationErrors = new ValidationErrors();
// Build a detailed tabular error.
for (int rowNum = 0; rowNum < importContext.getImportRows().size(); rowNum++) {
String rowObsUnitId = ((ExperimentObservation)importContext.getImportRows().get(rowNum)).getObsUnitID();
if (e.getMissingIds().contains(rowObsUnitId)) {
ExperimentUtilities.addRowError(ExperimentObservation.Columns.OBS_UNIT_ID, ExperimentUtilities.INVALID_OBS_UNIT_ID_ERROR, validationErrors, rowNum);
}
}
throw new ValidatorException(validationErrors);
}
@Override
public List<BrAPIObservationUnit> brapiRead() throws ApiException, EntityNotFoundException {
// Collect deltabreed-generated obs unit ids listed in the import
Set<String> obsUnitIds = cache.getReferenceOUIds();
// For each id fetch the observation unit from the brapi data store
return observationUnitService.getObservationUnitsByDbId(new HashSet<>(obsUnitIds), importContext.getProgram());
}

brAPIDatasetReadWorkflowInitialization.execute();
brAPIGermplasmReadWorkflowInitialization.execute();
} catch (ApiException e) {
} catch (ApiException | ValidatorException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the OU initialization from this middleware and add it to the prior validation middleware step.

Suggested change
} catch (ApiException | ValidatorException e) {
@Override
public AppendOverwriteMiddlewareContext process(AppendOverwriteMiddlewareContext context) {
brAPITrialReadWorkflowInitialization = brAPIReadFactory.trialWorkflowReadInitializationBean(context);
brAPIStudyReadWorkflowInitialization = brAPIReadFactory.studyWorkflowReadInitializationBean(context);
locationReadWorkflowInitialization = brAPIReadFactory.locationWorkflowReadInitializationBean(context);
brAPIDatasetReadWorkflowInitialization = brAPIReadFactory.datasetWorkflowReadInitializationBean(context);
brAPIGermplasmReadWorkflowInitialization = brAPIReadFactory.germplasmWorkflowReadInitializationBean(context);
log.debug("reading required BrAPI data from BrAPI service");
try {
brAPITrialReadWorkflowInitialization.execute();
brAPIStudyReadWorkflowInitialization.execute();
locationReadWorkflowInitialization.execute();
brAPIDatasetReadWorkflowInitialization.execute();
brAPIGermplasmReadWorkflowInitialization.execute();
} catch (ApiException e) {
context.getAppendOverwriteWorkflowContext().setProcessError(new MiddlewareException(e));
return this.compensate(context);
} catch (EntityNotFoundException e) {
// TODO: handle edge cases of missing brapi entities as needed
return this.compensate(context);
}
return processNext(context);
}

Comment on lines +389 to +393
// Make sure the workflow statistic is not null.
if (context.getAppendOverwriteWorkflowContext().getStatistic() == null) {
context.getAppendOverwriteWorkflowContext().setStatistic(statistic);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Make sure the workflow statistic is not null.
if (context.getAppendOverwriteWorkflowContext().getStatistic() == null) {
context.getAppendOverwriteWorkflowContext().setStatistic(statistic);
}
/**
* Handle the edge case where a user imports with the append/overwrite workflow for an experiment
* without a dataset defined (i.e. no observation variables headers) and the import does not
* actually have new data to append
*/
if (phenotypeCols.isEmpty()) {
processedData = processedDataFactory.undefinedDatasetBean();
updatePreviewStatistics(processedData, context, studyName, unitId);
}
// Assemble the pending observation data for all phenotypes
for (Column<?> column : phenotypeCols) {
String cellData = column.getString(rowNum);

* @throws IllegalStateException if the retrieved observation units do not match the provided observation unit IDs
*/
public List<BrAPIObservationUnit> getObservationUnitsByDbId(Set<String> obsUnitIds, Program program) throws ApiException, IllegalStateException {
public List<BrAPIObservationUnit> getObservationUnitsByDbId(Set<String> obsUnitIds, Program program) throws ApiException, IllegalStateException, MissingValuesException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method name should be changed, the bug fixed, and EntityNotFoundException used as a throwable.

Suggested change
public List<BrAPIObservationUnit> getObservationUnitsByDbId(Set<String> obsUnitIds, Program program) throws ApiException, IllegalStateException, MissingValuesException {
public List<BrAPIObservationUnit> getObservationUnitsById(Set<String> obsUnitIds, Program program) throws ApiException, IllegalStateException, EntityNotFoundException {
List<BrAPIObservationUnit> brapiUnits = null;
// Retrieve reference Observation Units based on IDs
brapiUnits = brAPIObservationUnitDAO.getObservationUnitsById(obsUnitIds, program);
// If no BrAPI units are found, throw an EntityNotFoundException with an error message
if (obsUnitIds.size() != brapiUnits.size()) {
Set<String> missingIds = new HashSet<>(obsUnitIds);
// Calculate missing IDs based on retrieved BrAPI units
//missingIds.removeAll(brapiUnits.stream().map(BrAPIObservationUnit::getObservationUnitDbId).collect(Collectors.toSet()));
missingIds.removeAll(brapiUnits.stream()
.map(unit -> Utilities.getExternalReference(unit.getExternalReferences(), BRAPI_REFERENCE_SOURCE, ExternalReferenceSource.OBSERVATION_UNITS))
.filter(Optional::isPresent)
.map(Optional::get)
.map(BrAPIExternalReference::getReferenceId).collect(Collectors.toSet()));
// Throw exception with missing IDs information
throw new EntityNotFoundException(missingIds);
}
return brapiUnits;
}

@mlm483 mlm483 requested a review from dmeidlin February 19, 2025 14:47
@mlm483 mlm483 merged commit 686568c into develop Feb 19, 2025
1 check passed
@mlm483 mlm483 deleted the BI-2156-qa branch February 19, 2025 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants