Abstract
AbstractLarge scale surveys of prokaryotic communities (metagenomics) as well as isolate genomes have recently revealed that prokaryotic diversity is predominantly organized in sequence-discrete units that may be equated to species. Specifically, genomes of the same unit (or species) commonly show >95% genome-aggregate average nucleotide identity (ANI) to each other and <90% ANI to members of other species, while genomes showing 90-95% ANI are comparatively rare. However, it remains unclear if such “discontinuities” or gaps in ANI values can be observed within species and thus, be used to define strains or clonal complexes; two cornerstone concepts for microbiology that remain ill-defined. By analyzing 18,123 complete isolate genomes from 330 bacterial species with at least ten genome representatives each, we show that such a natural discontinuity exists at around 99.5% ANI. Further, we show that the 99.5% ANI threshold is largely consistent with how clonal complexes have been defined in previous epidemiological studies but provides clusters with ∼20% higher accuracy in terms of evolutionary relatedness of the grouped genomes and greater homogeneity in gene content. Collectively, our results should facilitate future micro-diversity studies across clinical or environmental settings because they provide a more natural definition of a clonal complex and strain.
Publisher
Cold Spring Harbor Laboratory