Characterizing Families of Spectral Similarity Scores and Their Use Cases for Gas Chromatography–Mass Spectrometry Small Molecule Identification-Reference-Cited by-同舟云学术

Characterizing Families of Spectral Similarity Scores and Their Use Cases for Gas Chromatography–Mass Spectrometry Small Molecule Identification

Published:2023-10-21 Issue:10 Volume:13 Page:1101
ISSN:2218-1989
Container-title:Metabolites
language:en
Short-container-title:Metabolites

Author:

Degnan David J.¹^ORCID,Flores Javier E.¹^ORCID,Brayfindley Eva R.²^ORCID,Paurus Vanessa L.³,Webb-Robertson Bobbie-Jo M.¹,Clendinen Chaevien S.³,Bramer Lisa M.¹^ORCID

Affiliation:

1. Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA

2. Artificial Intelligence and Data Analytics Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA

3. Environmental and Molecular Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA

Abstract

Metabolomics provides a unique snapshot into the world of small molecules and the complex biological processes that govern the human, animal, plant, and environmental ecosystems encapsulated by the One Health modeling framework. However, this “molecular snapshot” is only as informative as the number of metabolites confidently identified within it. The spectral similarity (SS) score is traditionally used to identify compound(s) in mass spectrometry approaches to metabolomics, where spectra are matched to reference libraries of candidate spectra. Unfortunately, there is little consensus on which of the dozens of available SS metrics should be used. This lack of standard SS score creates analytic uncertainty and potentially leads to issues in reproducibility, especially as these data are integrated across other domains. In this work, we use metabolomic spectral similarity as a case study to showcase the challenges in consistency within just one piece of the One Health framework that must be addressed to enable data science approaches for One Health problems. Here, using a large cohort of datasets comprising both standard and complex datasets with expert-verified truth annotations, we evaluated the effectiveness of 66 similarity metrics to delineate between correct matches (true positives) and incorrect matches (true negatives). We additionally characterize the families of these metrics to make informed recommendations for their use. Our results indicate that specific families of metrics (the Inner Product, Correlative, and Intersection families of scores) tend to perform better than others, with no single similarity metric performing optimally for all queried spectra. This work and its findings provide an empirically-based resource for researchers to use in their selection of similarity metrics for GC-MS identification, increasing scientific reproducibility through taking steps towards standardizing identification workflows.

Funder

m/q Initiative at Pacific Northwest National Laboratory

U.S. Department of Energy

Battelle Memorial Institute

PNNL Laboratory Directed Research and Development program

Publisher

MDPI AG

Subject

Molecular Biology,Biochemistry,Endocrinology, Diabetes and Metabolism

Link

https://www.mdpi.com/2218-1989/13/10/1101/pdf

Reference38 articles.

1. The evolution of One Health: A decade of progress and challenges for the future;Gibbs;Vet. Rec.,2014

2. Only one health, and so many omics;Cancer Cell Int.,2015

3. Informatics and Data Analytics to Support Exposome-Based Discovery for Public Health;Manrai;Annu. Rev. Public Health,2017

4. Traversi, D., and Ripabelli, G. (2022). Editorial: New omics research challenges for Public and sustainable Health. Front. Microbiol., 13.

5. Challenges and opportunities of molecular epidemiology: Using omics to address complex One Health issues in tropical settings;Mekuria;Front. Trop. Dis.,2023

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Mass spectrometry‐based metabolomics for the investigation of antibiotic–bacterial interactions;Mass Spectrometry Reviews;2024-07-14