[BI-2046] Overwrite observations happening without reference to ObsUnitID#332
[BI-2046] Overwrite observations happening without reference to ObsUnitID#332dmeidlin merged 40 commits intorelease/0.9from
Conversation
nickpalladino
left a comment
There was a problem hiding this comment.
Did @shawnyarnes want the Observation units to be looked up based on the ObsUnitID? It looks like in fetchOrCreateObsUnitPIO there is dependence on study information. I think ultimately she wanted updates to only rely on the ObsUnitID and none of the other data fields in the file.
If I try to update with a file like this:
exp4_download_obsunitupdate.xls
I get an exception:
java.lang.NumberFormatException: null
at java.base/java.lang.Integer.parseInt(Integer.java:620)
at java.base/java.lang.Integer.parseInt(Integer.java:776)
at org.breedinginsight.brapps.importer.services.processors.ExperimentProcessor.yearToSeasonDbIdFromDatabase(ExperimentProcessor.java:1826)
at org.breedinginsight.brapps.importer.services.processors.ExperimentProcessor.yearToSeasonDbId(ExperimentProcessor.java:1796)
at org.breedinginsight.brapps.importer.services.processors.ExperimentProcessor.initNewBrapiData(ExperimentProcessor.java:560)
at org.breedinginsight.brapps.importer.services.processors.ExperimentProcessor.process(ExperimentProcessor.java:233)
at org.breedinginsight.brapps.importer.services.processors.ProcessorManager.process(ProcessorManager.java:63)
at org.breedinginsight.brapps.importer.model.imports.experimentObservation.ExperimentImportService.process(ExperimentImportService.java:74)
at org.breedinginsight.brapps.importer.services.FileImportService.lambda$processFile$9(FileImportService.java:427)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1771)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1763)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1016)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1665)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1598)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
mlm483
left a comment
There was a problem hiding this comment.
Tested, working. Looks good, the only critical change is the error message text. Other than that I had a few thoughts and clarifying questions.
Error Message
- There is duplication of "Import Cannot Proceed". See screenshot in one of my comments on ExperimentProcessor.java.
- The error is a simple banner error stating "Experimental entities are missing ObsUnitIDs". I don't have an issue with this, but it wasn't clear if the error is intended to indicate row numbers or not. Would defer to you and Shawn on this.
"Reference" Observation Unit
- Code and comments use the term Reference Observation Unit, could you clarify what "Reference" means in this context?
|
|
||
| private static final String NAME = "Experiment"; | ||
| private static final String MISSING_OBS_UNIT_ID_ERROR = "Experimental entities are missing ObsUnitIDs"; | ||
| private static final String MISSING_OBS_UNIT_ID_ERROR = "Experimental entities are missing ObsUnitIDs. Import cannot proceed"; |
There was a problem hiding this comment.
| private static final String MISSING_OBS_UNIT_ID_ERROR = "Experimental entities are missing ObsUnitIDs. Import cannot proceed"; | |
| private static final String MISSING_OBS_UNIT_ID_ERROR = "Experimental entities are missing ObsUnitIDs"; |
"Import cannot proceed." is duplicated, because ". Import cannot proceed." is appended by bi-web.

| } | ||
|
|
||
| private void processAndCacheTrial(BrAPITrial existingTrial, Program program, String trialRefSource, Map<String, PendingImportObject<BrAPITrial>> trialByNameNoScope) { | ||
| private void processAndCacheTrial( |
There was a problem hiding this comment.
Having "Cache" in the name suggests there might be interaction with our Redis cache. I get that it's caching in an instance variable, but if possible I might rename this and other similar methods to remove the word "Cache".
| this.obsVarDatasetByName = initializeObsVarDatasetByName(program, experimentImportRows); | ||
| this.existingGermplasmByGID = initializeExistingGermplasmByGID(program, experimentImportRows); | ||
| // check for references to Deltabreed-generated observation units | ||
| referenceOUIds = collateReferenceOUIds(importRows); |
There was a problem hiding this comment.
Here's an example of instance variables mutated as side effects (collateReferenceOUIds mutates hasAllReferenceUnitIds and hasNoReferenceUnitIds). There is also the assignment which makes it even less obvious that there are side effects.
| initializeTrialsForExistingObservationUnits(program, trialByNameNoScope); | ||
| initializeStudiesForExistingObservationUnits(program, studyByNameNoScope); |
There was a problem hiding this comment.
Here are examples of mutating passed-by-reference variables.
| // check for references to Deltabreed-generated observation units | ||
| referenceOUIds = collateReferenceOUIds(importRows); | ||
|
|
||
| if (hasAllReferenceUnitIds) { | ||
| try { | ||
|
|
||
| // get all prior units referenced in import | ||
| pendingObsUnitByOUId = fetchReferenceObservationUnits(referenceOUIds, program); | ||
| observationUnitByNameNoScope = mapPendingObservationUnitByName(pendingObsUnitByOUId, program); | ||
| initializeTrialsForExistingObservationUnits(program, trialByNameNoScope); | ||
| initializeStudiesForExistingObservationUnits(program, studyByNameNoScope); | ||
| locationByName = initializeLocationByName(program, studyByNameNoScope); | ||
| obsVarDatasetByName = initializeObsVarDatasetForExistingObservationUnits(trialByNameNoScope, program); | ||
| existingGermplasmByGID = initializeGermplasmByGIDForExistingObservationUnits(observationUnitByNameNoScope, program); | ||
| for (Map.Entry<String, PendingImportObject<BrAPIObservationUnit>> unitEntry : pendingObsUnitByOUId.entrySet()) { | ||
| String unitId = unitEntry.getKey(); | ||
| BrAPIObservationUnit unit = unitEntry.getValue().getBrAPIObject(); | ||
| mapPendingTrialByOUId(unitId, unit, trialByNameNoScope, studyByNameNoScope, pendingTrialByOUId, program); | ||
| mapPendingStudyByOUId(unitId, unit, studyByNameNoScope, pendingStudyByOUId, program); | ||
| mapPendingLocationByOUId(unitId, unit, pendingStudyByOUId, locationByName, pendingLocationByOUId); | ||
| mapPendingObsDatasetByOUId(unitId, pendingTrialByOUId, obsVarDatasetByName, pendingObsDatasetByOUId); | ||
| mapGermplasmByOUId(unitId, unit, existingGermplasmByGID, pendingGermplasmByOUId); |
There was a problem hiding this comment.
The mixture of assignments, mutating instance variables as side effects, and mutating passed-by-reference variables confusing. I have to dig into every called method to understand what state is mutated where and the lack of a consistent pattern makes it harder for me to reason about what's happening.
Given it's working and we want to release v0.9 I don't recommend making large changes, but maybe the team can discuss what we want our best practices to be in an IT planning meeting.
There was a problem hiding this comment.
Ya, I think we can address this as part of the v0.10 work and refactoring of the ExperimentProcessor
| if (hasAllReferenceUnitIds) { | ||
| String unitName = obsUnitPIO.getBrAPIObject().getObservationUnitName(); | ||
| String studyName = studyPIO.getBrAPIObject().getStudyName(); | ||
| key = getObservationHash(studyName + unitName, variableName, studyName); |
There was a problem hiding this comment.
Why is studyName prepended to unitName here?
| if (trialPio!=null && ImportObjectState.EXISTING==trialPio.getState() && | ||
| (StringUtils.isBlank( importRow.getObsUnitID() )) && (envPio!=null && ImportObjectState.EXISTING==envPio.getState() ) ){ | ||
| throw new UnprocessableEntityException(PREEXISTING_EXPERIMENT_TITLE); | ||
| } |
There was a problem hiding this comment.
This conditional is pretty complex, a comment explaining the intention might be helpful.
| /** | ||
| * This function collates unique ObsUnitID values from a list of BrAPIImport objects. | ||
| * It iterates through the list and adds non-blank ObsUnitID values to a Set. It also checks for any repeated ObsUnitIDs. | ||
| * | ||
| * @param importRows a List of BrAPIImport objects containing ExperimentObservation data | ||
| * @return a Set of unique ObsUnitID strings | ||
| * @throws IllegalStateException if a repeated ObsUnitID is encountered | ||
| */ | ||
| private Set<String> collateReferenceOUIds(List<BrAPIImport> importRows) { |
There was a problem hiding this comment.
You might want to indicate that the hasAllReferenceUnitIds and hasNoReferenceUnitIds instance variables are set/mutated as well.
|
Updated the error message. |
nickpalladino
left a comment
There was a problem hiding this comment.
When testing I saw a few things but think they are unrelated to the work in this card. I noticed when doing an updating phenotypes for a single observation unit the statistics were showing 1 for the observation units but I thought it would be 0. I also noticed the changeLog was showing a blank string for originalValue and I was expecting the prior value. I'm good creating new cards for these if they're legitimate issues.
exp4.xls
exp4_no_refs_2.xls


| // check for references to Deltabreed-generated observation units | ||
| referenceOUIds = collateReferenceOUIds(importRows); | ||
|
|
||
| if (hasAllReferenceUnitIds) { | ||
| try { | ||
|
|
||
| // get all prior units referenced in import | ||
| pendingObsUnitByOUId = fetchReferenceObservationUnits(referenceOUIds, program); | ||
| observationUnitByNameNoScope = mapPendingObservationUnitByName(pendingObsUnitByOUId, program); | ||
| initializeTrialsForExistingObservationUnits(program, trialByNameNoScope); | ||
| initializeStudiesForExistingObservationUnits(program, studyByNameNoScope); | ||
| locationByName = initializeLocationByName(program, studyByNameNoScope); | ||
| obsVarDatasetByName = initializeObsVarDatasetForExistingObservationUnits(trialByNameNoScope, program); | ||
| existingGermplasmByGID = initializeGermplasmByGIDForExistingObservationUnits(observationUnitByNameNoScope, program); | ||
| for (Map.Entry<String, PendingImportObject<BrAPIObservationUnit>> unitEntry : pendingObsUnitByOUId.entrySet()) { | ||
| String unitId = unitEntry.getKey(); | ||
| BrAPIObservationUnit unit = unitEntry.getValue().getBrAPIObject(); | ||
| mapPendingTrialByOUId(unitId, unit, trialByNameNoScope, studyByNameNoScope, pendingTrialByOUId, program); | ||
| mapPendingStudyByOUId(unitId, unit, studyByNameNoScope, pendingStudyByOUId, program); | ||
| mapPendingLocationByOUId(unitId, unit, pendingStudyByOUId, locationByName, pendingLocationByOUId); | ||
| mapPendingObsDatasetByOUId(unitId, pendingTrialByOUId, obsVarDatasetByName, pendingObsDatasetByOUId); | ||
| mapGermplasmByOUId(unitId, unit, existingGermplasmByGID, pendingGermplasmByOUId); |
There was a problem hiding this comment.
Ya, I think we can address this as part of the v0.10 work and refactoring of the ExperimentProcessor
| observations.add(this.observationByHash.get(getImportObservationHash(importRow, getVariableNameFromColumn(column)))); | ||
| } | ||
| String observationHash; | ||
| if (hasAllReferenceUnitIds) { |
There was a problem hiding this comment.
I think in v0.10 we can try to make these workflows more modular and break things out a bit more rather than having if-else blocks in a number of different places
Description
Story: BI-2046
Validation errors are now returned for display in the experiment import preview for any rows referring to an existing environment and with the ObsUnitID blank.
Dependencies
bi-web release/0.9
Testing
Verify all tests pass in ExperimentFileImportTest
Checklist: