Skip to content

Add similarity scores#720

Merged
YukiMatsuzawa merged 6 commits intomasterfrom
add-similarity-scores
Mar 23, 2026
Merged

Add similarity scores#720
YukiMatsuzawa merged 6 commits intomasterfrom
add-similarity-scores

Conversation

@YukiMatsuzawa
Copy link
Copy Markdown
Contributor

No description provided.

Implemented Enhanced Dot Product similarity scoring for MS/MS spectra, with penalty and peak count adjustments, based on recent literature. Added `EnhancedDotProduct` and `SpectralEntropy` properties to `MsScanMatchResult` and integrated their calculation and storage in annotation scoring. These changes improve spectral similarity assessment and annotation robustness.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds additional MS/MS similarity metrics (enhanced dot product and spectral entropy) to the annotation pipeline and exposes them in exported analysis metadata.

Changes:

  • Add EnhancedDotProduct and SpectralEntropy fields to MsScanMatchResult.
  • Compute the new similarity metrics in MsReferenceScorer.
  • Add spectral entropy similarity calculation updates and introduce an enhanced dot product implementation in MsScanMatching, plus export columns in BaseAnalysisMetadataAccessor.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/MSDIAL5/MsdialCore/Export/IAnalysisMetadataAccessor.cs Adds new export headers/values for enhanced dot product and spectral entropy.
src/MSDIAL5/MsdialCore/Algorithm/Annotation/MsReferenceScorer.cs Computes and populates the new similarity metrics into match results.
src/Common/CommonStandard/DataObj/Result/MsScanMatchResult.cs Adds MessagePack-serialized properties for the new similarity metrics.
src/Common/CommonStandard/Algorithm/Scoring/MsScanMatching.cs Fixes spelling in entropy method name, adjusts entropy implementation, and adds enhanced dot product scoring.
src/Common/CommonStandard/Algorithm/Function/SpectrumHandler.cs Adds normalization by total intensity helper for entropy calculation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +800 to +801
var entropy1 = GetSpectralEntropy(SpectrumHandler.GetBinnedSpectrum(peaks1, bin));
var entropy2 = GetSpectralEntropy(SpectrumHandler.GetBinnedSpectrum(peaks2, bin));
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpectrumHandler.GetBinnedSpectrum(peaksX, bin) is ambiguous because SpectrumHandler has two overloads that can both be called with two arguments (one uses optional parameters). This will fail compilation. Disambiguate by using a named argument (e.g., bin: bin to select the 2-arg binning overload, or delta: bin to select the top-N overload), or rename one overload to avoid ambiguity.

Suggested change
var entropy1 = GetSpectralEntropy(SpectrumHandler.GetBinnedSpectrum(peaks1, bin));
var entropy2 = GetSpectralEntropy(SpectrumHandler.GetBinnedSpectrum(peaks2, bin));
var entropy1 = GetSpectralEntropy(SpectrumHandler.GetBinnedSpectrum(peaks1, bin: bin));
var entropy2 = GetSpectralEntropy(SpectrumHandler.GetBinnedSpectrum(peaks2, bin: bin));

Copilot uses AI. Check for mistakes.
Comment on lines +43 to +44
var sqenhancedDotProduct = MsScanMatching.GetEnhancedDotProduct(scan, reference, parameter.Ms2Tolerance, parameter.MassRangeBegin, parameter.MassRangeEnd, .6d);
var spectrumEntropy = MsScanMatching.GetSpectralEntropySimilarity(scan.Spectrum, reference.Spectrum, parameter.Ms2Tolerance);
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetEnhancedDotProduct() returns -1 when spectra are unavailable, but the result is immediately passed to Math.Sqrt() and stored in EnhancedDotProduct. Math.Sqrt(-1) yields NaN, which can propagate into exports/serialization. Handle negative return values explicitly (e.g., set EnhancedDotProduct to -1/0 when sqenhancedDotProduct < 0) and consider extracting the .6d penalty into a named constant/parameter so its meaning is clear and configurable.

Copilot uses AI. Check for mistakes.
Comment on lines +4198 to +4203
scalarM += measuredMassBuffer[i].Intensity * (1 - penalty) * measuredMassBuffer[i].FocusedMz;
}
else {
scalarM += measuredMassBuffer[i].Intensity * measuredMassBuffer[i].FocusedMz;
}
scalarR += referenceMassBuffer[i].Intensity * referenceMassBuffer[i].FocusedMz;
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The penalty parameter currently has no effect in the measuredMassBuffer[i].Intensity == 0d branch because you're multiplying by measuredMassBuffer[i].Intensity (which is 0), so the penalized term is always 0. As written, the score is effectively identical to the reverse dot product for the bins you include. Revisit this branch so the penalty actually influences scalarM/the final score (e.g., apply the penalty to a non-zero term, or incorporate bins where the reference intensity is below cutoff if you intend to penalize unmatched query peaks).

Suggested change
scalarM += measuredMassBuffer[i].Intensity * (1 - penalty) * measuredMassBuffer[i].FocusedMz;
}
else {
scalarM += measuredMassBuffer[i].Intensity * measuredMassBuffer[i].FocusedMz;
}
scalarR += referenceMassBuffer[i].Intensity * referenceMassBuffer[i].FocusedMz;
// Penalize unmatched reference peaks by reducing their contribution to scalarR.
scalarR += referenceMassBuffer[i].Intensity * (1 - penalty) * referenceMassBuffer[i].FocusedMz;
}
else {
scalarM += measuredMassBuffer[i].Intensity * measuredMassBuffer[i].FocusedMz;
scalarR += referenceMassBuffer[i].Intensity * referenceMassBuffer[i].FocusedMz;
}

Copilot uses AI. Check for mistakes.
Comment on lines +67 to +68
public static List<SpectrumPeak> GetNormalizedByTotalIntensityPeaks(List<SpectrumPeak> spectrum) {
var sumIntensity = spectrum.Sum(n => n.Intensity);
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetNormalizedByTotalIntensityPeaks() divides by sumIntensity without guarding against sumIntensity == 0. If a spectrum can contain only zero-intensity peaks, this will produce NaN/Infinity intensities and break downstream similarity calculations. Add a fast-path for spectrum.Count == 0 and sumIntensity <= 0 (e.g., return an empty list or the original list) to keep the normalization numerically safe.

Suggested change
public static List<SpectrumPeak> GetNormalizedByTotalIntensityPeaks(List<SpectrumPeak> spectrum) {
var sumIntensity = spectrum.Sum(n => n.Intensity);
public static List<SpectrumPeak> GetNormalizedByTotalIntensityPeaks(List<SpectrumPeak> spectrum) {
if (spectrum == null || spectrum.Count == 0) {
return new List<SpectrumPeak>();
}
var sumIntensity = spectrum.Sum(n => n.Intensity);
if (sumIntensity <= 0d) {
// All intensities are zero or non-positive; return an empty normalized spectrum
// to avoid division by zero and unstable similarity calculations.
return new List<SpectrumPeak>();
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

MatchedPeaksCount = (float)matchedPeaksScores[1],
AcurateMassSimilarity = (float)ms1Similarity,
IsotopeSimilarity = (float)isotopeSimilarity,
EnhancedDotProduct = (float)Math.Sqrt(sqenhancedDotProduct),
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetEnhancedDotProduct() can return -1 when spectra are not comparable. Taking Math.Sqrt(-1) yields NaN, which then propagates into MsScanMatchResult and downstream export/scoring. Guard the sentinel (e.g., if the squared score < 0, keep EnhancedDotProduct as -1/null) before applying Math.Sqrt.

Suggested change
EnhancedDotProduct = (float)Math.Sqrt(sqenhancedDotProduct),
EnhancedDotProduct = sqenhancedDotProduct < 0 ? -1f : (float)Math.Sqrt(sqenhancedDotProduct),

Copilot uses AI. Check for mistakes.
Comment on lines +43 to +44
var sqenhancedDotProduct = MsScanMatching.GetEnhancedDotProduct(scan, reference, parameter.Ms2Tolerance, parameter.MassRangeBegin, parameter.MassRangeEnd, .6d);
var spectrumEntropy = MsScanMatching.GetSpectralEntropySimilarity(scan.Spectrum, reference.Spectrum, parameter.Ms2Tolerance);
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 0.6d penalty passed to GetEnhancedDotProduct(...) is a hard-coded magic number with no explanation or link to configuration. Consider promoting this to a parameter (e.g., on MsRefSearchParameterBase) or at least a named constant so it can be tuned per workflow and is self-documenting.

Copilot uses AI. Check for mistakes.
Comment on lines +67 to +70
public static List<SpectrumPeak> GetNormalizedByTotalIntensityPeaks(List<SpectrumPeak> spectrum) {
var sumIntensity = spectrum.Sum(n => n.Intensity);
return spectrum.Select(n => new SpectrumPeak { Mass = n.Mass, Intensity = n.Intensity / sumIntensity }).ToList();
}
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetNormalizedByTotalIntensityPeaks() divides by sumIntensity without guarding sumIntensity == 0. If a spectrum contains only zero-intensity peaks, this will yield NaN/Infinity intensities and break downstream similarity calculations. Add a guard (e.g., return an empty list or all-zero intensities) when sumIntensity <= 0 (and consider also handling spectrum == null for callers that may pass null).

Copilot uses AI. Check for mistakes.
Comment on lines +795 to +801
public static double GetSpectralEntropySimilarity(List<SpectrumPeak> peaks1, List<SpectrumPeak> peaks2, double bin) {
if (!IsComparedAvailable(peaks1, peaks2)) return -1d;

var combinedSpectrum = SpectrumHandler.GetCombinedSpectrum(SpectrumHandler.GetNormalizedByTotalIntensityPeaks(peaks1), SpectrumHandler.GetNormalizedByTotalIntensityPeaks(peaks2), bin);
var entropy12 = GetSpectralEntropy(combinedSpectrum);
var entropy1 = GetSpectralEntropy(peaks1);
var entropy2 = GetSpectralEntropy(peaks2);
var entropy1 = GetSpectralEntropy(SpectrumHandler.GetBinnedSpectrum(peaks1, bin));
var entropy2 = GetSpectralEntropy(SpectrumHandler.GetBinnedSpectrum(peaks2, bin));
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetSpectralEntropySimilarity() normalizes peaks via GetNormalizedByTotalIntensityPeaks(). If total intensity is 0, that normalization produces non-finite intensities and this method can return NaN. After adding a guard in the normalization helper, consider also handling degenerate/non-finite entropy results here (e.g., return -1/0).

Copilot uses AI. Check for mistakes.
}

if (measuredMassBuffer[i].Intensity == 0d) {
scalarM += measuredMassBuffer[i].Intensity * (1 - penalty) * measuredMassBuffer[i].FocusedMz;
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In GetEnhancedDotProduct(), penalty currently has no effect: when measuredMassBuffer[i].Intensity == 0d, the code adds measuredMassBuffer[i].Intensity * (1 - penalty) * ..., which is always 0. This contradicts the XML docs stating unmatched query peaks are weighted by penalty. Update the formula so penalty actually changes the score (or remove the parameter if it’s not intended).

Suggested change
scalarM += measuredMassBuffer[i].Intensity * (1 - penalty) * measuredMassBuffer[i].FocusedMz;
// Penalize unmatched query peaks by assigning them a reduced contribution
// based on the corresponding reference peak intensity.
scalarM += (1 - penalty) * referenceMassBuffer[i].Intensity * referenceMassBuffer[i].FocusedMz;

Copilot uses AI. Check for mistakes.
Comment on lines 39 to 46
"Simple dot product",
"Weighted dot product",
"Reverse dot product",
"Enhanced dot product",
"Spectrum entropy",
"Matched peaks count",
"Matched peaks percentage",
"Total score",
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header list now includes "Enhanced dot product" and "Spectrum entropy", but the existing GetContentTest() doesn’t assert that GetContent() produces these keys (and with correct formatting / sentinel handling). Adding coverage for these two new columns would help catch issues like NaN propagation or unexpected defaults.

Copilot uses AI. Check for mistakes.
@YukiMatsuzawa YukiMatsuzawa merged commit 450d2f8 into master Mar 23, 2026
13 checks passed
@YukiMatsuzawa YukiMatsuzawa deleted the add-similarity-scores branch March 23, 2026 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants