fixed annotation deduplicate function#674
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR fixes the annotation deduplication function by introducing a new unified MatchResultAnnotationDeduplicator class that deduplicates alignment spots by Library ID, InChIKey, and Name, and enhances the existing MspAnnotationDeduplicator with similar multi-criteria deduplication.
- Replaces separate MSP and TextDB deduplicators with a unified
MatchResultAnnotationDeduplicator - Adds InChIKey and Name-based deduplication to
MspAnnotationDeduplicator - Simplifies deduplication logic in
AlignmentRefinerandDimsAlignmentRefiner
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| src/MSDIAL5/MsdialCore/Algorithm/Alignment/SpotAction.cs | Adds new MatchResultAnnotationDeduplicator class and extends MspAnnotationDeduplicator with InChIKey and Name-based deduplication |
| src/MSDIAL5/MsdialCore/Algorithm/Alignment/AlignmentRefiner.cs | Replaces separate MSP and TextDB deduplicators with unified MatchResultAnnotationDeduplicator |
| src/MSDIAL5/MsdialDimsCore/Algorithm/Alignment/DimsAlignmentRefiner.cs | Replaces separate MSP and TextDB deduplicators with unified MatchResultAnnotationDeduplicator |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| // by InChIKey | ||
| spotList = spots.Where(n => n.MspBasedMatchResult.InChIKey.Length > 1).OrderByDescending(spot => spot.MspBasedMatchResult.InChIKey).ToList(); |
There was a problem hiding this comment.
Missing null or empty check for InChIKey. The filter only checks InChIKey.Length > 1 without first checking if InChIKey is null or empty, which could cause a NullReferenceException. Consider adding:
spotList = spots.Where(n => !string.IsNullOrEmpty(n.MspBasedMatchResult?.InChIKey) && n.MspBasedMatchResult.InChIKey.Length > 1).OrderByDescending(spot => spot.MspBasedMatchResult.InChIKey).ToList();| spotList = spots.Where(n => n.MspBasedMatchResult.InChIKey.Length > 1).OrderByDescending(spot => spot.MspBasedMatchResult.InChIKey).ToList(); | |
| spotList = spots.Where(n => !string.IsNullOrEmpty(n.MspBasedMatchResult?.InChIKey) && n.MspBasedMatchResult.InChIKey.Length > 1).OrderByDescending(spot => spot.MspBasedMatchResult.InChIKey).ToList(); |
| if (_param.OnlyReportTopHitInTextDBSearch) { | ||
| var textDbDedupicator = new TextAnnotationDeduplicator(); | ||
| textDbDedupicator.Process(alignments); | ||
| if (_param.OnlyReportTopHitInMspSearch) { |
There was a problem hiding this comment.
The OnlyReportTopHitInTextDBSearch parameter is no longer being checked after this change. If the new MatchResultAnnotationDeduplicator is intended to handle both MSP and TextDB deduplication, the parameter should either be removed from the codebase or the condition should be updated to:
if (_param.OnlyReportTopHitInMspSearch || _param.OnlyReportTopHitInTextDBSearch) {Otherwise, the TextDB-specific parameter has become non-functional.
| if (_param.OnlyReportTopHitInMspSearch) { | |
| if (_param.OnlyReportTopHitInMspSearch || _param.OnlyReportTopHitInTextDBSearch) { |
| { | ||
| public void Process(IEnumerable<AlignmentSpotProperty> spots) { | ||
| var spotList = spots.OrderByDescending(spot => spot.MspID).ToList(); | ||
|
|
There was a problem hiding this comment.
Potential NullReferenceException if spotList is empty. The first pass (lines 121-143) doesn't check if the list has any elements before accessing spotList[currentPeakId] at lines 124 and 131. Consider adding a guard:
if (spotList.Count == 0) return;| if (spotList.Count == 0) return; |
| for (int i = 1; i < spotList.Count; i++) { | ||
| if (spotList[i].MatchResults.Representative.Name.ToLower() != currentName) { | ||
| currentPeakId = i; | ||
| currentName = spotList[currentPeakId].MatchResults.Representative.Name; |
There was a problem hiding this comment.
Inconsistent case handling for Name comparison. Line 92 uses ToLower() when setting currentName, but line 96 doesn't apply ToLower() when updating currentName. This should be:
currentName = spotList[currentPeakId].MatchResults.Representative.Name.ToLower();| currentName = spotList[currentPeakId].MatchResults.Representative.Name; | |
| currentName = spotList[currentPeakId].MatchResults.Representative.Name.ToLower(); |
| } | ||
|
|
||
| public void Process(IEnumerable<AlignmentSpotProperty> spots) { | ||
| var spotList = spots.Where(n => n.IsReferenceMatched(_evaluator)).OrderByDescending(spot => spot.MatchResults.Representative.LibraryID).ToList(); |
There was a problem hiding this comment.
Potential NullReferenceException if spotList is empty. The code accesses spotList[currentPeakId] at line 35 without checking if the list has any elements. Consider adding a check:
if (spotList.Count == 0) return;| var spotList = spots.Where(n => n.IsReferenceMatched(_evaluator)).OrderByDescending(spot => spot.MatchResults.Representative.LibraryID).ToList(); | |
| var spotList = spots.Where(n => n.IsReferenceMatched(_evaluator)).OrderByDescending(spot => spot.MatchResults.Representative.LibraryID).ToList(); | |
| if (spotList.Count == 0) return; |
| @@ -48,13 +48,8 @@ public DimsAlignmentRefiner(MsdialDimsParameter param, IupacDatabase iupac, IMat | |||
|
|
|||
| private void Deduplicate(IList<AlignmentSpotProperty> alignments) { // TODO: change deduplicate process (msp, textdb, metabolite name...) | |||
| if (_param.OnlyReportTopHitInMspSearch) { //to remove duplicate identifications | |||
There was a problem hiding this comment.
The OnlyReportTopHitInTextDBSearch parameter is no longer being checked after this change. If the new MatchResultAnnotationDeduplicator is intended to handle both MSP and TextDB deduplication, the parameter should either be removed from the codebase or the condition should be updated to check both parameters. Otherwise, the TextDB-specific parameter has become non-functional.
| if (_param.OnlyReportTopHitInMspSearch) { //to remove duplicate identifications | |
| if (_param.OnlyReportTopHitInMspSearch || _param.OnlyReportTopHitInTextDBSearch) { //to remove duplicate identifications |
No description provided.