Abstract
Abstract
Objective
Data clustering is a common exploration step in the omics era, notably in genomics and proteomics where many genes or proteins can be quantified from one or more experiments. Bayesian clustering is a powerful unsupervised algorithm that can classify several thousands of genes or proteins. AutoClass C, its original implementation, handles missing data, automatically determines the best number of clusters but is not user-friendly.
Results
We developed an online tool called AutoClassWeb, which provides an easy-to-use and simple web interface for Bayesian clustering with AutoClass. Input data are entered as TSV files and quality controlled. Results are provided in formats that ease further analyses with spreadsheet programs or with programming languages, such as Python or R. AutoClassWeb is implemented in Python and is published under the 3-Clauses BSD license. The source code is available at https://github.com/pierrepo/autoclassweb along with a detailed documentation.
Funder
Ministère de l’Enseignement Supérieur et de la Recherche
Publisher
Springer Science and Business Media LLC
Subject
General Biochemistry, Genetics and Molecular Biology,General Medicine
Reference14 articles.
1. Cheeseman P, Stutz J. Bayesian classification(autoclass): theory and results. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R, editors. Advances in knowledge discovery and data mining. Boston: AAAI/MIT Press; 1996. p. 153–80.
2. Stutz J, Cheeseman P. Autoclass - A Bayesian Approach to Classification. In: Skilling J, Sibisi S, editors., Maximum entropy and bayesian methods. No. 70 in the fundamental theories of physics their clarification, development and application. Dordrecht: Kluwer Academic Publishers; 1996.
3. Elliott MC, Tanaka PM, Schwark RW, Andrade R. Serotonin differentially regulates L5 pyramidal cell classes of the medial prefrontal cortex in rats and mice. ENeuro. 2018;5(1):e0305-17.
4. Crook AC, Baddeley R, Osorio D. Identifying the structure in cuttlefish visual signals. Philos Trans R Soc Lond B Biol Sci. 2002;357(1427):1617–24.
5. Achcar F, Camadro JM, Mestivier D. AutoClass@IJM: a powerful tool for bayesian classification of heterogeneous data in biology. Nucleic Acids Res. 2009;37(Web Server):W63–7.