The Biology and Laboratory Paradigm Shift: A Review of Machine Learning and Artificial Intelligence in Biomolecular Data Interpretation and Predictive Modeling

Saeed Ali Alasmari (1) , Abdulmajeed Saad Bin Baz (2) , Mohammed Awadh Alshehri (3) , Saad Salem Aldawsari (4) , Awadh Jarallah Alkaabi (5) , Saad Mahdi Saleh Alamri (2) , Turki Saeed Alwadaie (6) , ALhanouf Mohammed Moredh (2) , Turkia Mohammed Alharthi (7) , Abdullah Ahmed Alamer (8) , Abdullah Salman Al Salman (9)
(1) Dirayiah Hospital, Ministry of Health, Saudi Arabia,
(2) Regional Laboratory,Ministry of Health, Saudi Arabia,
(3) King Saud Medical City,Ministry of Health, Saudi Arabia,
(4) Riyadh Regional Lab,Ministry of Health, Saudi Arabia,
(5) MCH Najran,Ministry of Health, Saudi Arabia,
(6) KSMC, Ministry of Health, Saudi Arabia,
(7) Ministry Of Health Branch In Riyadh, Saudi Arabia,
(8) Security Forces Hospital, Ministry of Health, Saudi Arabia,
(9) Jalajil PHC,Ministry of Health, Saudi Arabia

Abstract

Background: The life sciences are experiencing an explosion of data from high-throughput genomics, proteomics, and metabolomics. It is a challenging problem to interpret the complex data sets in parallel with developments in artificial intelligence (AI) and machine learning (ML).


Aim: This review categorizes the groundbreaking contribution of AI/ML to biomolecular data science during the period 2015-2024, elucidating its use in multi-omics analysis, protein structure prediction, and experimental automation.


Methods: We performed a systematic literature review highlighting the application of sophisticated computational models such as deep neural networks, graph neural networks, and transformer architectures in diverse biomolecular data.


Results: Our results establish that AI/ML has changed the discipline at its core. These technologies facilitate the discovery of new biomarkers and drug targets from multi-omics data and have made breakthrough achievements in protein structure prediction using AlphaFold2. In addition, AI is now automating experimental design, making closed-loop systems that accelerate discovery.


Conclusion: AI and ML are no longer ancillary tools but intrinsic drivers of a new paradigm in molecular biology. Although data quality and interpretability challenges persist, the incorporation of AI is imperative for decoding the patterns of complex biological systems and developing personalized medicine.

Full text article

Generated from XML file

References

Anwardeen, N. R., Diboun, I., Mokrab, Y., Althani, A. A., & Elrayess, M. A. (2023). Statistical methods and resources for biomarker discovery using metabolomics. BMC bioinformatics, 24(1), 250. https://doi.org/10.1186/s12859-023-05383-0

Baek, M., DiMaio, F., Anishchenko, I., Dauparas, J., Ovchinnikov, S., Lee, G. R., ... & Baker, D. (2021). Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557), 871-876. https://doi.org/10.1126/science.abj8754

Brandes, N., Ofer, D., Peleg, Y., Rappoport, N., & Linial, M. (2022). ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics, 38(8), 2102-2110. https://doi.org/10.1093/bioinformatics/btac020

Bruynseels, K., Santoni de Sio, F., & Van den Hoven, J. (2018). Digital twins in health care: ethical implications of an emerging engineering paradigm. Frontiers in genetics, 9, 31. https://doi.org/10.3389/fgene.2018.00031

Chen, K. M., Wong, A. K., Troyanskaya, O. G., & Zhou, J. (2022). A sequence-based global map of regulatory activity for deciphering human genetics. Nature genetics, 54(7), 940-949. https://doi.org/10.1038/s41588-022-01102-2

Dauparas, J., Anishchenko, I., Bennett, N., Bai, H., Ragotte, R. J., Milles, L. F., ... & Baker, D. (2022). Robust deep learning–based protein sequence design using ProteinMPNN. Science, 378(6615), 49-56. https://doi.org/10.1126/science.add2187

Dührkop, K., Fleischauer, M., Ludwig, M., Aksenov, A. A., Melnik, A. V., Meusel, M., ... & Böcker, S. (2019). SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nature methods, 16(4), 299-302. https://doi.org/10.1038/s41592-019-0344-8

Eraslan, G., Avsec, Ž., Gagneur, J., & Theis, F. J. (2019). Deep learning: new computational modelling techniques for genomics. Nature reviews genetics, 20(7), 389-403. https://doi.org/10.1038/s41576-019-0122-6

Evans, R., O’Neill, M., Pritzel, A., Antropova, N., Senior, A., Green, T., ... & Hassabis, D. (2021). Protein complex prediction with AlphaFold-Multimer. biorxiv, 2021-10. https://doi.org/10.1101/2021.10.04.463034

Feuerriegel, S., Hartmann, J., Janiesch, C., & Zschech, P. (2024). Generative ai. Business & Information Systems Engineering, 66(1), 111-126. https://doi.org/10.1007/s12599-023-00834-7

Gendron, Y., Andrew, J., & Cooper, C. (2022). The perils of artificial intelligence in academic publishing. Critical Perspectives on Accounting, 87, 102411. https://doi.org/10.1016/j.cpa.2021.102411

Gessulat, S., Schmidt, T., Zolg, D. P., Samaras, P., Schnatbaum, K., Zerweck, J., ... & Wilhelm, M. (2019). Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nature methods, 16(6), 509-518. https://doi.org/10.1038/s41592-019-0426-7

Hasin, Y., Seldin, M., & Lusis, A. (2017). Multi-omics approaches to disease. Genome biology, 18(1), 83. https://doi.org/10.1186/s13059-017-1215-1

Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. nature, 596(7873), 583-589. https://doi.org/10.1038/s41586-021-03819-2

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444. https://doi.org/10.1038/nature14539

Lopez, R., Regier, J., Cole, M. B., Jordan, M. I., & Yosef, N. (2018). Deep generative modeling for single-cell transcriptomics. Nature methods, 15(12), 1053-1058. https://doi.org/10.1038/s41592-018-0229-2

Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.

Malkomes, G., & Garnett, R. (2018). Automating Bayesian optimization with Bayesian optimization. Advances in Neural Information Processing Systems, 31.

Marouf, M., Machart, P., Bansal, V., Kilian, C., Magruder, D. S., Krebs, C. F., & Bonn, S. (2020). Realistic in silico generation and augmentation of single-cell RNA-seq data using generative adversarial networks. Nature communications, 11(1), 166. https://doi.org/10.1038/s41467-019-14018-z

Mobadersany, P., Yousefi, S., Amgad, M., Gutman, D. A., Barnholtz-Sloan, J. S., Velázquez Vega, J. E., ... & Cooper, L. A. (2018). Predicting cancer outcomes from histology and genomics using convolutional networks. Proceedings of the National Academy of Sciences, 115(13), E2970-E2979. https://doi.org/10.1073/pnas.1717139115

Moor, M., Banerjee, O., Abad, Z. S. H., Krumholz, H. M., Leskovec, J., Topol, E. J., & Rajpurkar, P. (2023). Foundation models for generalist medical artificial intelligence. Nature, 616(7956), 259-265. https://doi.org/10.1038/s41586-023-05881-4

Moses, L., & Pachter, L. (2022). Museum of spatial transcriptomics. Nature methods, 19(5), 534-546. https://doi.org/10.1038/s41592-022-01409-2

Ofer, D., Brandes, N., & Linial, M. (2021). The language of proteins: NLP, machine learning & protein sequences. Computational and Structural Biotechnology Journal, 19, 1750-1758. https://doi.org/10.1016/j.csbj.2021.03.022

Picard, M., Scott-Boyer, M. P., Bodein, A., Périn, O., & Droit, A. (2021). Integration strategies of multi-omics data for machine learning analysis. Computational and Structural Biotechnology Journal, 19, 3735-3746. https://doi.org/10.1016/j.csbj.2021.06.030

Poplin, R., Chang, P. C., Alexander, D., Schwartz, S., Colthurst, T., Ku, A., ... & DePristo, M. A. (2018). A universal SNP and small-indel variant caller using deep neural networks. Nature biotechnology, 36(10), 983-987. https://doi.org/10.1038/nbt.4235

Raikar, G. V. S., Raikar, A. S., & Somnache, S. N. (2023). Advancements in artificial intelligence and machine learning in revolutionising biomarker discovery. Brazilian Journal of Pharmaceutical Sciences, 59, e23146. https://doi.org/10.1590/s2175-97902023e23146

Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., ... & Fergus, R. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15), e2016239118. https://doi.org/10.1073/pnas.2016239118

Seifrid, M., Pollice, R., Aguilar-Granda, A., Morgan Chan, Z., Hotta, K., Ser, C. T., ... & Aspuru-Guzik, A. (2022). Autonomous chemical experiments: Challenges and perspectives on establishing a self-driving lab. Accounts of Chemical Research, 55(17), 2454-2466. https://doi.org/10.1021/acs.accounts.2c00220

Schölkopf, B., Locatello, F., Bauer, S., Ke, N. R., Kalchbrenner, N., Goyal, A., & Bengio, Y. (2021). Toward causal representation learning. Proceedings of the IEEE, 109(5), 612-634. https://doi.org/10.1109/JPROC.2021.3058954

Senior, A. W., Evans, R., Jumper, J., Kirkpatrick, J., Sifre, L., Green, T., ... & Hassabis, D. (2020). Improved protein structure prediction using potentials from deep learning. Nature, 577(7792), 706-710. https://doi.org/10.1038/s41586-019-1923-7

Strubell, E., Ganesh, A., & McCallum, A. (2020, April). Energy and policy considerations for modern deep learning research. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 09, pp. 13693-13696). https://doi.org/10.1609/aaai.v34i09.7123

Thornton, J. M., Laskowski, R. A., & Borkakoti, N. (2021). AlphaFold heralds a data-driven revolution in biology and medicine. Nature Medicine, 27(10), 1666-1669. https://doi.org/10.1038/s41591-021-01533-0

Varadi, M., Anyango, S., Deshpande, M., Nair, S., Natassia, C., Yordanova, G., ... & Velankar, S. (2022). AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic acids research, 50(D1), D439-D444. https://doi.org/10.1093/nar/gkab1061

Wang, H., Fu, T., Du, Y., Gao, W., Huang, K., Liu, Z., ... & Zitnik, M. (2023). Scientific discovery in the age of artificial intelligence. Nature, 620(7972), 47-60. https://doi.org/10.1038/s41586-023-06221-2

Watson, J. L., Juergens, D., Bennett, N. R., Trippe, B. L., Yim, J., Eisenach, H. E., ... & Baker, D. (2023). De novo design of protein structure and function with RFdiffusion. Nature, 620(7976), 1089-1100. https://doi.org/10.1038/s41586-023-06415-8

Yang, K. D., Belyaeva, A., Venkatachalapathy, S., Damodaran, K., Katcoff, A., Radhakrishnan, A., ... & Uhler, C. (2021). Multi-domain translation between single-cell imaging and sequencing data using autoencoders. Nature communications, 12(1), 31. https://doi.org/10.1038/s41467-020-20249-2

Zhang, B., Whiteaker, J. R., Hoofnagle, A. N., Baird, G. S., Rodland, K. D., & Paulovich, A. G. (2019). Clinical potential of mass spectrometry-based proteogenomics. Nature Reviews Clinical Oncology, 16(4), 256-268. https://doi.org/10.1038/s41571-018-0135-7

Zhou, J., Theesfeld, C. L., Yao, K., Chen, K. M., Wong, A. K., & Troyanskaya, O. G. (2018). Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nature genetics, 50(8), 1171-1179. https://doi.org/10.1038/s41588-018-0160-6

Zitnik, M., Nguyen, F., Wang, B., Leskovec, J., Goldenberg, A., & Hoffman, M. M. (2019). Machine learning for integrating data in biology and medicine: Principles, practice, and opportunities. Information Fusion, 50, 71-91. https://doi.org/10.1016/j.inffus.2018.09.012

Zrimec, J., Börlin, C. S., Buric, F., Muhammad, A. S., Chen, R., Siewers, V., ... & Zelezniak, A. (2020). Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure. Nature communications, 11(1), 6141. https://doi.org/10.1038/s41467-020-19921-4

Authors

Saeed Ali Alasmari
s515s515@hotmail.com (Primary Contact)
Abdulmajeed Saad Bin Baz
Mohammed Awadh Alshehri
Saad Salem Aldawsari
Awadh Jarallah Alkaabi
Saad Mahdi Saleh Alamri
Turki Saeed Alwadaie
ALhanouf Mohammed Moredh
Turkia Mohammed Alharthi
Abdullah Ahmed Alamer
Abdullah Salman Al Salman
Alasmari, S. A., Baz, A. S. B., Alshehri, M. A., Aldawsari, S. S., Alkaabi, A. J., Alamri, S. M. S., … Al Salman, A. S. (2024). The Biology and Laboratory Paradigm Shift: A Review of Machine Learning and Artificial Intelligence in Biomolecular Data Interpretation and Predictive Modeling. Saudi Journal of Medicine and Public Health, 1(2), 749–756. https://doi.org/10.64483/jmph-191

Article Details

Similar Articles

<< < 1 2 3 4 > >> 

You may also start an advanced similarity search for this article.