Conversation
Implemented Enhanced Dot Product similarity scoring for MS/MS spectra, with penalty and peak count adjustments, based on recent literature. Added `EnhancedDotProduct` and `SpectralEntropy` properties to `MsScanMatchResult` and integrated their calculation and storage in annotation scoring. These changes improve spectral similarity assessment and annotation robustness.
There was a problem hiding this comment.
Pull request overview
This PR adds additional MS/MS similarity metrics (enhanced dot product and spectral entropy) to the annotation pipeline and exposes them in exported analysis metadata.
Changes:
- Add
EnhancedDotProductandSpectralEntropyfields toMsScanMatchResult. - Compute the new similarity metrics in
MsReferenceScorer. - Add spectral entropy similarity calculation updates and introduce an enhanced dot product implementation in
MsScanMatching, plus export columns inBaseAnalysisMetadataAccessor.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/MSDIAL5/MsdialCore/Export/IAnalysisMetadataAccessor.cs | Adds new export headers/values for enhanced dot product and spectral entropy. |
| src/MSDIAL5/MsdialCore/Algorithm/Annotation/MsReferenceScorer.cs | Computes and populates the new similarity metrics into match results. |
| src/Common/CommonStandard/DataObj/Result/MsScanMatchResult.cs | Adds MessagePack-serialized properties for the new similarity metrics. |
| src/Common/CommonStandard/Algorithm/Scoring/MsScanMatching.cs | Fixes spelling in entropy method name, adjusts entropy implementation, and adds enhanced dot product scoring. |
| src/Common/CommonStandard/Algorithm/Function/SpectrumHandler.cs | Adds normalization by total intensity helper for entropy calculation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| var entropy1 = GetSpectralEntropy(SpectrumHandler.GetBinnedSpectrum(peaks1, bin)); | ||
| var entropy2 = GetSpectralEntropy(SpectrumHandler.GetBinnedSpectrum(peaks2, bin)); |
There was a problem hiding this comment.
SpectrumHandler.GetBinnedSpectrum(peaksX, bin) is ambiguous because SpectrumHandler has two overloads that can both be called with two arguments (one uses optional parameters). This will fail compilation. Disambiguate by using a named argument (e.g., bin: bin to select the 2-arg binning overload, or delta: bin to select the top-N overload), or rename one overload to avoid ambiguity.
| var entropy1 = GetSpectralEntropy(SpectrumHandler.GetBinnedSpectrum(peaks1, bin)); | |
| var entropy2 = GetSpectralEntropy(SpectrumHandler.GetBinnedSpectrum(peaks2, bin)); | |
| var entropy1 = GetSpectralEntropy(SpectrumHandler.GetBinnedSpectrum(peaks1, bin: bin)); | |
| var entropy2 = GetSpectralEntropy(SpectrumHandler.GetBinnedSpectrum(peaks2, bin: bin)); |
| var sqenhancedDotProduct = MsScanMatching.GetEnhancedDotProduct(scan, reference, parameter.Ms2Tolerance, parameter.MassRangeBegin, parameter.MassRangeEnd, .6d); | ||
| var spectrumEntropy = MsScanMatching.GetSpectralEntropySimilarity(scan.Spectrum, reference.Spectrum, parameter.Ms2Tolerance); |
There was a problem hiding this comment.
GetEnhancedDotProduct() returns -1 when spectra are unavailable, but the result is immediately passed to Math.Sqrt() and stored in EnhancedDotProduct. Math.Sqrt(-1) yields NaN, which can propagate into exports/serialization. Handle negative return values explicitly (e.g., set EnhancedDotProduct to -1/0 when sqenhancedDotProduct < 0) and consider extracting the .6d penalty into a named constant/parameter so its meaning is clear and configurable.
| scalarM += measuredMassBuffer[i].Intensity * (1 - penalty) * measuredMassBuffer[i].FocusedMz; | ||
| } | ||
| else { | ||
| scalarM += measuredMassBuffer[i].Intensity * measuredMassBuffer[i].FocusedMz; | ||
| } | ||
| scalarR += referenceMassBuffer[i].Intensity * referenceMassBuffer[i].FocusedMz; |
There was a problem hiding this comment.
The penalty parameter currently has no effect in the measuredMassBuffer[i].Intensity == 0d branch because you're multiplying by measuredMassBuffer[i].Intensity (which is 0), so the penalized term is always 0. As written, the score is effectively identical to the reverse dot product for the bins you include. Revisit this branch so the penalty actually influences scalarM/the final score (e.g., apply the penalty to a non-zero term, or incorporate bins where the reference intensity is below cutoff if you intend to penalize unmatched query peaks).
| scalarM += measuredMassBuffer[i].Intensity * (1 - penalty) * measuredMassBuffer[i].FocusedMz; | |
| } | |
| else { | |
| scalarM += measuredMassBuffer[i].Intensity * measuredMassBuffer[i].FocusedMz; | |
| } | |
| scalarR += referenceMassBuffer[i].Intensity * referenceMassBuffer[i].FocusedMz; | |
| // Penalize unmatched reference peaks by reducing their contribution to scalarR. | |
| scalarR += referenceMassBuffer[i].Intensity * (1 - penalty) * referenceMassBuffer[i].FocusedMz; | |
| } | |
| else { | |
| scalarM += measuredMassBuffer[i].Intensity * measuredMassBuffer[i].FocusedMz; | |
| scalarR += referenceMassBuffer[i].Intensity * referenceMassBuffer[i].FocusedMz; | |
| } |
| public static List<SpectrumPeak> GetNormalizedByTotalIntensityPeaks(List<SpectrumPeak> spectrum) { | ||
| var sumIntensity = spectrum.Sum(n => n.Intensity); |
There was a problem hiding this comment.
GetNormalizedByTotalIntensityPeaks() divides by sumIntensity without guarding against sumIntensity == 0. If a spectrum can contain only zero-intensity peaks, this will produce NaN/Infinity intensities and break downstream similarity calculations. Add a fast-path for spectrum.Count == 0 and sumIntensity <= 0 (e.g., return an empty list or the original list) to keep the normalization numerically safe.
| public static List<SpectrumPeak> GetNormalizedByTotalIntensityPeaks(List<SpectrumPeak> spectrum) { | |
| var sumIntensity = spectrum.Sum(n => n.Intensity); | |
| public static List<SpectrumPeak> GetNormalizedByTotalIntensityPeaks(List<SpectrumPeak> spectrum) { | |
| if (spectrum == null || spectrum.Count == 0) { | |
| return new List<SpectrumPeak>(); | |
| } | |
| var sumIntensity = spectrum.Sum(n => n.Intensity); | |
| if (sumIntensity <= 0d) { | |
| // All intensities are zero or non-positive; return an empty normalized spectrum | |
| // to avoid division by zero and unstable similarity calculations. | |
| return new List<SpectrumPeak>(); | |
| } |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| MatchedPeaksCount = (float)matchedPeaksScores[1], | ||
| AcurateMassSimilarity = (float)ms1Similarity, | ||
| IsotopeSimilarity = (float)isotopeSimilarity, | ||
| EnhancedDotProduct = (float)Math.Sqrt(sqenhancedDotProduct), |
There was a problem hiding this comment.
GetEnhancedDotProduct() can return -1 when spectra are not comparable. Taking Math.Sqrt(-1) yields NaN, which then propagates into MsScanMatchResult and downstream export/scoring. Guard the sentinel (e.g., if the squared score < 0, keep EnhancedDotProduct as -1/null) before applying Math.Sqrt.
| EnhancedDotProduct = (float)Math.Sqrt(sqenhancedDotProduct), | |
| EnhancedDotProduct = sqenhancedDotProduct < 0 ? -1f : (float)Math.Sqrt(sqenhancedDotProduct), |
| var sqenhancedDotProduct = MsScanMatching.GetEnhancedDotProduct(scan, reference, parameter.Ms2Tolerance, parameter.MassRangeBegin, parameter.MassRangeEnd, .6d); | ||
| var spectrumEntropy = MsScanMatching.GetSpectralEntropySimilarity(scan.Spectrum, reference.Spectrum, parameter.Ms2Tolerance); |
There was a problem hiding this comment.
The 0.6d penalty passed to GetEnhancedDotProduct(...) is a hard-coded magic number with no explanation or link to configuration. Consider promoting this to a parameter (e.g., on MsRefSearchParameterBase) or at least a named constant so it can be tuned per workflow and is self-documenting.
| public static List<SpectrumPeak> GetNormalizedByTotalIntensityPeaks(List<SpectrumPeak> spectrum) { | ||
| var sumIntensity = spectrum.Sum(n => n.Intensity); | ||
| return spectrum.Select(n => new SpectrumPeak { Mass = n.Mass, Intensity = n.Intensity / sumIntensity }).ToList(); | ||
| } |
There was a problem hiding this comment.
GetNormalizedByTotalIntensityPeaks() divides by sumIntensity without guarding sumIntensity == 0. If a spectrum contains only zero-intensity peaks, this will yield NaN/Infinity intensities and break downstream similarity calculations. Add a guard (e.g., return an empty list or all-zero intensities) when sumIntensity <= 0 (and consider also handling spectrum == null for callers that may pass null).
| public static double GetSpectralEntropySimilarity(List<SpectrumPeak> peaks1, List<SpectrumPeak> peaks2, double bin) { | ||
| if (!IsComparedAvailable(peaks1, peaks2)) return -1d; | ||
|
|
||
| var combinedSpectrum = SpectrumHandler.GetCombinedSpectrum(SpectrumHandler.GetNormalizedByTotalIntensityPeaks(peaks1), SpectrumHandler.GetNormalizedByTotalIntensityPeaks(peaks2), bin); | ||
| var entropy12 = GetSpectralEntropy(combinedSpectrum); | ||
| var entropy1 = GetSpectralEntropy(peaks1); | ||
| var entropy2 = GetSpectralEntropy(peaks2); | ||
| var entropy1 = GetSpectralEntropy(SpectrumHandler.GetBinnedSpectrum(peaks1, bin)); | ||
| var entropy2 = GetSpectralEntropy(SpectrumHandler.GetBinnedSpectrum(peaks2, bin)); |
There was a problem hiding this comment.
GetSpectralEntropySimilarity() normalizes peaks via GetNormalizedByTotalIntensityPeaks(). If total intensity is 0, that normalization produces non-finite intensities and this method can return NaN. After adding a guard in the normalization helper, consider also handling degenerate/non-finite entropy results here (e.g., return -1/0).
| } | ||
|
|
||
| if (measuredMassBuffer[i].Intensity == 0d) { | ||
| scalarM += measuredMassBuffer[i].Intensity * (1 - penalty) * measuredMassBuffer[i].FocusedMz; |
There was a problem hiding this comment.
In GetEnhancedDotProduct(), penalty currently has no effect: when measuredMassBuffer[i].Intensity == 0d, the code adds measuredMassBuffer[i].Intensity * (1 - penalty) * ..., which is always 0. This contradicts the XML docs stating unmatched query peaks are weighted by penalty. Update the formula so penalty actually changes the score (or remove the parameter if it’s not intended).
| scalarM += measuredMassBuffer[i].Intensity * (1 - penalty) * measuredMassBuffer[i].FocusedMz; | |
| // Penalize unmatched query peaks by assigning them a reduced contribution | |
| // based on the corresponding reference peak intensity. | |
| scalarM += (1 - penalty) * referenceMassBuffer[i].Intensity * referenceMassBuffer[i].FocusedMz; |
| "Simple dot product", | ||
| "Weighted dot product", | ||
| "Reverse dot product", | ||
| "Enhanced dot product", | ||
| "Spectrum entropy", | ||
| "Matched peaks count", | ||
| "Matched peaks percentage", | ||
| "Total score", |
There was a problem hiding this comment.
This header list now includes "Enhanced dot product" and "Spectrum entropy", but the existing GetContentTest() doesn’t assert that GetContent() produces these keys (and with correct formatting / sentinel handling). Adding coverage for these two new columns would help catch issues like NaN propagation or unexpected defaults.
No description provided.