Abstract
AbstractGenome-scale metabolic models are key biotechnology tools that can predict metabolic capabilities and growth for an organism. In particular, these models have become indispensable for metabolic analysis of microbial species and communities such as the gut microbiomes of humans and other animals. Accurate microbial models can be built automatically from genomes, but many microbes have only been observed through sequencing of marker genes such as 16S rRNA and thus remain inaccessible to genome-scale modeling. To extend the scope of genome-scale metabolic models to microbes that lack genomic information, we trained an artificial neural network to build microbial models from numeric representations of 16S rRNA gene sequences. Specifically, we built models and extracted 16S rRNA gene sequences from more than 15,000 reference and representative microbial genomes, computed multiple sequence alignments and large language model embeddings for the 16S rRNA gene sequences, and trained the neural network to predict metabolic reaction probabilities from sequences, alignments, or embeddings. Training was fast on a single graphics processing unit and trained networks predicted reaction probabilities accurately for unseen archaeal and bacterial sequences and species. This makes it possible to reconstruct microbial genome-scale metabolic networks from any 16S rRNA gene sequence and enables simulation of metabolism and growth for all observed microbial life.
Publisher
Cold Spring Harbor Laboratory