Affiliation:
1. Ernest & Young LLP, New York, USA
Abstract
Proteomics, the study of proteins and their functions within biological systems, has become increasingly data-intensive, presenting both opportunities and challenges. This project addresses the need for advanced data analytics and data integrity in proteomics research. Leveraging the power of machine learning (ML) and blockchain technology, this attempt aims to transform proteomics research. This work encompasses three key objectives. First, collect, clean, and integrate proteomics data from diverse sources, ensuring data quality and consistency. Second, employ ML algorithms to analyze this data, revealing crucial insights, identifying proteins, and predicting their functions. Third, implement blockchain technology to safeguard the authenticity and integrity of the proteomics data, providing an auditable and tamper-proof record. Implemented a user-friendly web interface, facilitating collaboration among researchers and scientists by granting access to shared data and results. This study included various classification methods for the investigation of protein classification, namely, random forests, logistic regression, neural networks, support vector machines, and decision trees. In conclusion, the proposed work is poised to revolutionize proteomics research by enhancing data analytics capabilities and securing data integrity, thereby enabling scientists to make more informed and confident discoveries in this critical field.
Reference21 articles.
1. J. Bernardes and C. Pedreira, (2013), "A Review of Protein Function Prediction Under Machine Learning Perspective," Recent Patents on Biotechnology, vol. 7, no. 2, pp. 122–141. http://dx.doi.org/10.2174/18722083113079990006
2. Aggarwal, Divyanshu & Hasija, Yasha. (2022). A Review of Deep Learning Techniques for Protein Function Prediction. https://doi.org/10.48550/arXiv.2211.09705
3. Karunapala, 2015. Karunapala, E. (2015). Protein Function Prediction Using Machine Learning. PhD thesis.
4. Piovesan et al., 2015. Piovesan, D., Giollo, M., Leonardi, E., Ferrari, C., and Tosatto, S. C. (2015). Inga: protein function prediction combining interaction networks, domain assignments and sequence similarity. Nucleic acids research, 43(W1): W134–W140. http://dx.doi.org/10.1093/nar/gkv523
5. Kotlyar et al., 2014. Kotlyar, M., Pastrello, C., Pivetta, F., Sardo, A. L., Cumbaa, C., Li, H., Naranian, T., Niu, Y., Ding, Z., Vafaee, F., et al. (2014). In silico prediction of physical protein interactions and characterization of interactome orphans. Nature methods, 12(1): 79 https://doi.org/10.1038/nmeth.3178