A Scoping Review of the Ethical, Legal, and Technical Dimensions of Privacy in Big Data Health Research
Abstract
Background: The proliferation of big data in health research—encompassing genomic datasets, electronic health records (EHRs), wearables, and multi-omics—offers unprecedented potential for scientific discovery and personalized medicine. However, this data-driven paradigm poses profound and novel challenges to the privacy of individuals, demanding an integrated analysis of ethical, legal, and technical safeguards. Aim: This scoping review synthesizes contemporary literature (2015-2024) to map the ethical dilemmas, legal frameworks, and technical solutions concerning privacy in big data health research. Methods: A systematic search was conducted across PubMed, IEEE Xplore, Scopus, and Google Scholar. Literature was thematically analyzed to identify key themes, tensions, and emergent strategies across the three dimensions. Results: The review identifies a core tension between data utility for the public good and individual privacy rights. Ethically, key issues include re-identification risk, informed consent for future unspecified research, and algorithmic bias. Legally, a fragmented global landscape exists, with regulations like the GDPR providing strong protections but creating compliance complexity. Technically, privacy-enhancing technologies (PETs) such as federated learning, differential privacy, and homomorphic encryption offer promising, yet imperfect, solutions. Conclusion: Effective privacy preservation in big data health research requires a harmonized, interdisciplinary approach. A robust governance framework must interweave ethical principles, adaptable legal compliance, and state-of-the-art technical controls, foster public trust while enabling responsible innovation.
Full text article
References
Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016, October). Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security (pp. 308-318). https://doi.org/10.1145/2976749.2978318
Acar, A., Aksu, H., Uluagac, A. S., & Conti, M. (2018). A survey on homomorphic encryption schemes: Theory and implementation. ACM Computing Surveys (Csur), 51(4), 1-35. https://doi.org/10.1145/3214303
Arksey, H., & O'Malley, L. (2005). Scoping studies: towards a methodological framework. International journal of social research methodology, 8(1), 19-32. https://doi.org/10.1080/1364557032000119616
Cohen, I. G., & Mello, M. M. (2018). HIPAA and protecting health information in the 21st century. Jama, 320(3), 231-232. doi:10.1001/jama.2018.5630
Dinov, I. D. (2016). Volume and value of big healthcare data. Journal of medical statistics and informatics, 4, 3. https://doi.org/10.7243/2053-7662-4-3
Dwork, C., McSherry, F., Nissim, K., & Smith, A. (2016). Calibrating noise to sensitivity in private data analysis. Journal of Privacy and Confidentiality, 7(3), 17-51. https://doi.org/10.29012/jpc.v7i3.405
El Emam, K., Mosquera, L., & Bass, J. (2020). Evaluating identity disclosure risk in fully synthetic health data: model development and validation. Journal of medical Internet research, 22(11), e23139. https://doi.org/10.2196/23139
Erlich, Y., & Narayanan, A. (2014). Routes for breaching and protecting genetic privacy. Nature Reviews Genetics, 15(6), 409-421. https://doi.org/10.1038/nrg3723
Evans, B. J. (2020). The streetlight effect: regulating genomics where the light is. Journal of Law, Medicine & Ethics, 48(1), 105-118. doi:10.1177/1073110520916998
Ferretti, A., Ienca, M., Sheehan, M., Blasimme, A., Dove, E. S., Farsides, B., ... & Vayena, E. (2021). Ethics review of big data research: What should stay and what should be reformed?. BMC medical ethics, 22(1), 51. https://doi.org/10.1186/s12910-021-00616-4
Ghazi, B., Golowich, N., Kumar, R., Manurangsi, P., & Zhang, C. (2021). Deep learning with label differential privacy. Advances in neural information processing systems, 34, 27131-27145.
Hamza, R., Hassan, A., Ali, A., Bashir, M. B., Alqhtani, S. M., Tawfeeg, T. M., & Yousif, A. (2022). Towards secure big data analysis via fully homomorphic encryption algorithms. Entropy, 24(4), 519. https://doi.org/10.3390/e24040519
Ienca, M., Ferretti, A., Hurst, S., Puhan, M., Lovis, C., & Vayena, E. (2018). Considerations for ethics review of big data health research: A scoping review. PloS one, 13(10), e0204937. https://doi.org/10.1371/journal.pone.0204937
Jiang, Y., Mosquera, L., Jiang, B., Kong, L., & El Emam, K. (2022). Measuring re-identification risk using a synthetic estimator to enable data sharing. PLoS One, 17(6), e0269097. https://doi.org/10.1371/journal.pone.0269097
Jordon, J., Szpruch, L., Houssiau, F., Bottarelli, M., Cherubin, G., Maple, C., ... & Weller, A. (2022). Synthetic Data--what, why and how?. arXiv preprint arXiv:2205.03257. https://doi.org/10.48550/arXiv.2205.03257
Kaye, J., Whitley, E. A., Lund, D., Morrison, M., Teare, H., & Melham, K. (2015). Dynamic consent: a patient interface for twenty-first century research networks. European journal of human genetics, 23(2), 141-146. https://doi.org/10.1038/ejhg.2014.71
Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE signal processing magazine, 37(3), 50-60. https://doi.org/10.1109/MSP.2020.2975749
Lu, Y., Shen, M., Wang, H., Wang, X., van Rechem, C., Fu, T., & Wei, W. (2023). Machine learning for synthetic data generation: a review. arXiv preprint arXiv:2302.04062. https://doi.org/10.48550/arXiv.2302.04062
Mascalzoni, D., Melotti, R., Pattaro, C., Pramstaller, P. P., Gögele, M., De Grandi, A., & Biasiotto, R. (2022). Ten years of dynamic consent in the CHRIS study: informed consent as a dynamic process. European Journal of Human Genetics, 30(12), 1391-1397. https://doi.org/10.1038/s41431-022-01160-4
Mittelstadt, B. D., & Floridi, L. (2016). The ethics of big data: current and foreseeable issues in biomedical contexts. The ethics of biomedical big data, 445-480. https://doi.org/10.1007/978-3-319-33525-4_19
Nissim, K., & Wood, A. (2021, December). Foundations for robust data protection: Co-designing law and computer science. In 2021 Third IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA) (pp. 235-242). IEEE. https://doi.org/10.1109/TPSISA52974.2021.00026
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453. https://doi.org/10.1126/science.aax2342
Ohno-Machado, L., Kim, J., Gabriel, R. A., Kuo, G. M., & Hogarth, M. A. (2018). Genomics and electronic health record systems. Human molecular genetics, 27(R1), R48-R55. https://doi.org/10.1093/hmg/ddy104
Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. Health information science and systems, 2(1), 3. https://doi.org/10.1186/2047-2501-2-3
Rezaeikhonakdar, D. (2023). AI chatbots and challenges of HIPAA compliance for AI developers and vendors. Journal of Law, Medicine & Ethics, 51(4), 988-995. doi:10.1017/jme.2024.15
Rocher, L., Hendrickx, J. M., & De Montjoye, Y. A. (2019). Estimating the success of re-identifications in incomplete datasets using generative models. Nature communications, 10(1), 3069. https://doi.org/10.1038/s41467-019-10933-3
Rockwern, B., Johnson, D., Snyder Sulmasy, L., & Medical Informatics Committee and Ethics, Professionalism and Human Rights Committee of the American College of Physicians. (2021). Health information privacy, protection, and use in the expanding digital health ecosystem: a position paper of the American College of Physicians. Annals of internal medicine, 174(7), 994-998. https://doi.org/10.7326/M20-7639
Shabani, M., & Borry, P. (2018). Rules for processing genetic data for research purposes in view of the new EU General Data Protection Regulation. European Journal of Human Genetics, 26(2), 149-156. https://doi.org/10.1038/s41431-017-0045-7
Staunton, C., Slokenberga, S., & Mascalzoni, D. (2019). The GDPR and the research exemption: considerations on the necessary safeguards for research biobanks. European Journal of Human Genetics, 27(8), 1159-1167. https://doi.org/10.1038/s41431-019-0386-5
Steinsbekk, K. S., Kåre Myskja, B., & Solberg, B. (2013). Broad consent versus dynamic consent in biobank research: is passive participation an ethical problem?. European Journal of Human Genetics, 21(9), 897-902. https://doi.org/10.1038/ejhg.2012.282
Tricco, A. C., Lillie, E., Zarin, W., O'Brien, K. K., Colquhoun, H., Levac, D., ... & Straus, S. E. (2018). PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Annals of internal medicine, 169(7), 467-473. https://doi.org/10.7326/M18-0850
Vayena, E., & Blasimme, A. (2018). Health research with big data: time for systemic oversight. The journal of law, medicine & ethics, 46(1), 119-129. https://doi.org/10.1177/1073110518766026
Vayena, E., Dzenowagis, J., Brownstein, J. S., & Sheikh, A. (2017). Policy implications of big data in the health sector. Bulletin of the World Health Organization, 96(1), 66. https://doi.org/10.2471/BLT.17.197426
Xu, J., Glicksberg, B. S., Su, C., Walker, P., Bian, J., & Wang, F. (2021). Federated learning for healthcare informatics. Journal of healthcare informatics research, 5(1), 1-19. https://doi.org/10.1007/s41666-020-00082-4
Authors
Copyright (c) 2024 Reem Menwer Owaid Alrashdi, Reem Munawar Awad Al-Rashdi, Salihah Abdullah Saeed Alghamdi, Khuluod Ali Mohammed Rezgallah, Abdulaziz Ali Abdulaziz Alghaythar, Faisal Fahad Mohammed Alshammari, Abdullah Jaber Eissa Faqihi, Dhaifallah Mohammed Dhaifallah Moraya, Ahlam Abdullah Ibrahim Aqeel, Muath Mohammed Dhaifallah Moraya, Khloud Masead Dhaif Allah Al-Mutairi, Nasser Nashi Alshaibani, Khaled Ibrahim Muhammad Mobaraki, Mohammed Saleh Abdulkareem Al Juma,, Sarah Ahmed Arif

This work is licensed under a Creative Commons Attribution 4.0 International License.
