ML-Powered Privacy Preservation in Biomedical Data Sharing

Main Article Content

Ehizokhale Jude Usiabulu
Abel Onolunosen Abhadionmhen
Husseni Iduku

Abstract

The sharing of biomedical data is essential for accelerating healthcare research, fostering medical innovation, and improving patient outcomes. Such data encompasses a wide range of sensitive information, including electronic health records, genomic sequences, and clinical trial results. Despite its value, biomedical data sharing poses significant privacy risks, such as patient re-identification, unauthorized access, and regulatory non-compliance. These concerns necessitate advanced techniques that balance the need for data utility with stringent privacy protection. Machine learning (ML) has emerged as a powerful tool to facilitate privacy-preserving biomedical data sharing. This manuscript presents a comprehensive review of state-of-the-art ML-based privacy preservation methods, including differential privacy, federated learning, homomorphic encryption, secure multi-party computation, and synthetic data generation through generative models. Each technique offers unique mechanisms to protect sensitive information while enabling collaborative analysis and predictive modeling. These methods have been applied practically across various biomedical domains, including collaborative disease risk prediction and genomic research, clinical trial data analysis, remote patient monitoring, and public health surveillance. Additionally, we evaluate relevant privacy and utility metrics that assess the effectiveness of privacy guarantees and the impact on model performance. The review further examines limitations and challenges—including computational overhead, data heterogeneity, privacy-utility trade-offs, and ethical considerations—that must be addressed to ensure robust and scalable solutions. Looking forward, the manuscript highlights promising future directions, such as hybrid privacy frameworks, enhanced synthetic data generation, real-time privacy-preserving analytics, standardization of evaluation protocols, and interdisciplinary policy development. By integrating these advancements, biomedical research can achieve safer and more effective data sharing, ultimately fostering innovation while respecting patient confidentiality and trust.

Article Details

How to Cite
Usiabulu, E. J., Abhadionmhen, A. O., & Iduku, H. (2025). ML-Powered Privacy Preservation in Biomedical Data Sharing. African Journal of Medicine, Surgery and Public Health Research, 2(3), 389-407. https://doi.org/10.58578/ajmsphr.v2i3.6143

References

Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep learning with differential privacy. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 308–318. https://doi.org/10.1145/2976749.2978318
Bagdasaryan, E., Veit, A., Hua, Y., Estrin, D., & Shmatikov, V. (2020). How to backdoor federated learning. Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2938–2948. https://arxiv.org/abs/1807.00459
Balle, B., Barthe, G., Gaboardi, M., Hsu, J., Murtagh, R., & Vadhan, S. (2018). Privacy amplification by subsampling: Tight analyses via couplings and divergences. Advances in Neural Information Processing Systems, 31, 6280–6290. https://arxiv.org/abs/1808.05240
Bassily, R., Smith, A., & Thakurta, A. (2014). Private empirical risk minimization: Efficient algorithms and tight error bounds. 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, 464–473. https://doi.org/10.1109/FOCS.2014.56
Beam, A. L., & Kohane, I. S. (2018). Big data and machine learning in health care. JAMA, 319(13), 1317–1318. https://doi.org/10.1001/jama.2017.18391
Beaulieu-Jones, B. K., Wu, Z. S., Williams, C., Lee, R., Bhavnani, S. P., Byrd, J. B., & Greene, C. S. (2019). Privacy-preserving generative deep neural networks support clinical data sharing. Circulation: Cardiovascular Quality and Outcomes, 12(7), e005122. https://doi.org/10.1161/CIRCOUTCOMES.118.005122
Bertino, E., & Sandhu, R. (2005). Database security—concepts, approaches, and challenges. IEEE Transactions on Dependable and Secure Computing, 2(1), 2–19. https://doi.org/10.1109/TDSC.2005.2
Bhagoji, A. N., Chakraborty, S., Mittal, P., & Calo, S. (2019). Analyzing federated learning through an adversarial lens. International Conference on Machine Learning, 634–643. https://doi.org/10.5555/3327546.3327559
Bhowmick, A., Jiang, W., Chen, H., Song, D., & Mittal, P. (2018). Protection against reconstruction and membership inference attacks in collaborative learning. IEEE Transactions on Dependable and Secure Computing, 18(5), 2454–2467. https://doi.org/10.1109/TDSC.2019.2958235
Blanton, M., & Aguiar, R. L. (2017). Secure multi-party computation for privacy-preserving data mining. Privacy-Preserving Machine Learning, 161–189. https://doi.org/10.1007/978-3-319-71957-1_8
Blanton, M., & Aliasgari, S. (2019). Secure and private genomic computation. Communications of the ACM, 62(2), 90–97. https://doi.org/10.1145/3282481
Cavoukian, A. (2012). Privacy by design: Origins, meaning, and prospects for assuring privacy and trust in the information era. Privacy Protection Measures and Technologies in Business Organizations: Aspects and Standards, 170–208. https://doi.org/10.4018/978-1-4666-0135-9.ch009
Chen, R. J., Lu, M. Y., Chen, T. Y., Williamson, D. F., & Mahmood, F. (2020). Synthetic data in machine learning for medicine and healthcare. Nature Biomedical Engineering, 4(8), 713–724. https://doi.org/10.1038/s41551-020-0540-0
Cheon, J. H., Kim, A., Kim, M., & Song, Y. (2017). Homomorphic encryption for arithmetic of approximate numbers. ASIACRYPT 2017, 409–437. https://doi.org/10.1007/978-3-319-70697-9_15
Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W. F., & Sun, J. (2017). Generating multi-label discrete electronic health records using generative adversarial networks. Machine Learning for Healthcare Conference, 286–305. https://arxiv.org/abs/1703.06490
Dayan, I., Roth, H. R., Zhong, A., Harouni, A., Gentili, A., Abidin, A., ... & Zou, J. (2021). Federated learning for predicting clinical outcomes in patients with COVID-19. Nature Medicine, 27(10), 1735–1743. https://doi.org/10.1038/s41591-021-01506-3
Dowlin, N., Gilad-Bachrach, R., Laine, K., Lauter, K., Naehrig, M., & Wernsing, J. (2016). CryptoNets: Applying neural networks to encrypted data with high throughput and accuracy. International Conference on Machine Learning, 201–210. https://arxiv.org/abs/1606.03478
Dwork, C. (2006). Differential privacy. Automata, Languages and Programming, 1–12. https://doi.org/10.1007/11787006_1
Dwork, C., & Roth, A. (2014). The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4), 211–407. https://doi.org/10.1561/0400000042
Dwork, C., Rothblum, G. N., & Vadhan, S. (2015). Boosting and differential privacy. IEEE 56th Annual Symposium on Foundations of Computer Science, 51–60. https://doi.org/10.1109/FOCS.2015.12
El Emam, K. (2015). Guide to the de-identification of personal health information. CRC Press.
El Emam, K., & Arbuckle, L. (2013). Anonymizing health data: Case studies and methods to get you started. O'Reilly Media, Inc.
Erlich, Y., & Narayanan, A. (2014). Routes for breaching and protecting genetic privacy. Nature Reviews Genetics, 15(6), 409–421. https://doi.org/10.1038/nrg3723
Evans, D., Kolesnikov, V., & Rosulek, M. (2018). A pragmatic introduction to secure multi-party computation. Foundations and Trends® in Privacy and Security, 2(2-3), 70–246. https://doi.org/10.1561/3300000016
Fredrikson, M., Jha, S., & Ristenpart, T. (2015). Model inversion attacks that exploit confidence information and basic countermeasures. Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, 1322–1333. https://doi.org/10.1145/2810103.2813677
Fredrikson, M., Lantz, E., Jha, S., Lin, S., Page, D., & Ristenpart, T. (2014). Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing. 2014 IEEE Symposium on Security and Privacy, 81–95. https://doi.org/10.1109/SP.2014.16
Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J., & Greenspan, H. (2018). Synthetic data augmentation using GAN for improved liver lesion classification. 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), 289–293. https://doi.org/10.1109/ISBI.2018.8363571
Gentry, C. (2009). A fully homomorphic encryption scheme. Stanford University. https://crypto.stanford.edu/craig/
Geyer, R. C., Klein, T., & Nabi, M. (2017). Differentially private federated learning: A client level perspective. arXiv preprint arXiv:1712.07557. https://arxiv.org/abs/1712.07557
Gostin, L. O., & Hodge, J. G. (2002). Personal privacy and common goods: A framework for balancing under the national health information infrastructure. Maryland Law Review, 62, 215–260.
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM Computing Surveys, 51(5), 1–42. https://doi.org/10.1145/3236009
Gymrek, M., McGuire, A. L., Golan, D., Halperin, E., & Erlich, Y. (2013). Identifying personal genomes by surname inference. Science, 339(6117), 321–324. https://doi.org/10.1126/science.1229566
Huang, Z., Wang, T., Liu, Y., & Song, D. (2020). Benchmarking and analyzing privacy preserving machine learning: A case study of membership inference attack. Proceedings of the 28th ACM International Conference on Multimedia, 1748–1757. https://doi.org/10.1145/3394171.3413658
Jagielski, M., Oprea, A., Biggio, B., Liu, C., Nita-Rotaru, C., & Li, B. (2019). Differentially private learning needs better features (or much more data). Advances in Neural Information Processing Systems, 32, 9368–9378. https://arxiv.org/abs/1904.02744
Jayaraman, B., & Evans, D. (2019). Evaluating differentially private machine learning in practice. arXiv preprint arXiv:1902.08990. https://arxiv.org/abs/1902.08990
Johnson, A. E., Pollard, T. J., Shen, L., Lehman, L. W., Feng, M., Ghassemi, M., ... & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data, 3, 160035. https://doi.org/10.1038/sdata.2016.35
Jordon, J., Yoon, J., & van der Schaar, M. (2019). PATE-GAN: Generating synthetic data with differential privacy guarantees. International Conference on Learning Representations. https://openreview.net/forum?id=HJGs0sC5Ym
Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., ... & Zhao, S. (2021). Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2), 1–210. https://doi.org/10.1561/2200000073
Kim, H., & Wang, S. (2019). Privacy-preserving clinical trial analytics using secure multi-party computation. Journal of Biomedical Informatics, 98, 103275. https://doi.org/10.1016/j.jbi.2019.103275
Kim, M., Song, Y., & Cheon, J. H. (2018). Homomorphic encryption for arithmetic of approximate numbers. ASIACRYPT 2017, 409–437. https://doi.org/10.1007/978-3-319-70697-9_15
Lauter, K., Naehrig, M., & Vaikuntanathan, V. (2014). Can homomorphic encryption be practical? Proceedings of the 3rd ACM workshop on Cloud computing security workshop, 113–124. https://doi.org/10.1145/2517470.2517477
Le, T., Kairouz, P., Nissim, K., & Murtagh, F. (2020). On the utility of differential privacy for health data. Journal of Privacy and Confidentiality, 10(1). https://doi.org/10.29012/jpc.v10i1.1280
Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3), 50–60. https://doi.org/10.1109/MSP.2020.2975749
Li, X., Huang, W., Yang, W., Wang, S., & Zhang, Z. (2019). Multi-user data privacy protection based on federated learning. IEEE Access, 7, 160180–160191. https://doi.org/10.1109/ACCESS.2019.2954423
Malin, B., Karp, D., & Scheuermann, R. H. (2011). Technical and policy approaches to balancing patient privacy and data sharing in clinical and translational research. Journal of Investigative Medicine, 59(5), 759–765. https://doi.org/10.231/JIM.0b013e3182183e53
McMahan, H. B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. Artificial Intelligence and Statistics, 1273–1282. https://arxiv.org/abs/1602.05629
Melis, L., Song, C., De Cristofaro, E., & Shmatikov, V. (2019). Exploiting unintended feature leakage in collaborative learning. 2019 IEEE Symposium on Security and Privacy (SP), 691–706. https://doi.org/10.1109/SP.2019.00058
Mohassel, P., & Rindal, P. (2018). ABY3: A mixed protocol framework for machine learning. 2018 IEEE Symposium on Security and Privacy (SP), 35–52. https://doi.org/10.1109/SP.2018.00013
Mohassel, P., & Zhang, Y. (2017). SecureML: A system for scalable privacy-preserving machine learning. 2017 IEEE Symposium on Security and Privacy (SP), 19–38. https://doi.org/10.1109/SP.2017.13
Narayanan, A., & Shmatikov, V. (2008). Robust de-anonymization of large sparse datasets. 2008 IEEE Symposium on Security and Privacy, 111–125. https://doi.org/10.1109/SP.2008.33
Ohm, P. (2010). Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review, 57, 1701–1777.
Papernot, N., Abadi, M., Erlingsson, Ú., Goodfellow, I., & Talwar, K. (2018). Semi-supervised knowledge transfer for deep learning from private training data. International Conference on Learning Representations. https://openreview.net/pdf?id=rJzIBfZAb
Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., & Swami, A. (2016). Practical black-box attacks against machine learning. Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, 506–519. https://doi.org/10.1145/3052973.3053009
Powell, T., & Houghton, L. (2018). Patient perspectives on data sharing and privacy in biomedical research. Journal of Medical Ethics, 44(12), 809–813. https://doi.org/10.1136/medethics-2018-104872
Rashid, F., Al-Qurishi, M., Al-Salman, A., & Qureshi, K. N. (2021). Privacy-preserving frameworks for smart healthcare applications: A comprehensive review. IEEE Access, 9, 17114–17136. https://doi.org/10.1109/ACCESS.2021.3050982
Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 (General Data Protection Regulation). (2016). Official Journal of the European Union, L119, 1–88.
Rieke, N., Hancox, J., Li, W., Milletari, F., Roth, H. R., Albarqouni, S., ... & Cardoso, M. J. (2020). The future of digital health with federated learning. NPJ Digital Medicine, 3, Article 119. https://doi.org/10.1038/s41746-020-00323-1
Salathé, M., Althaus, C. L., Neher, R., Stringhini, S., Hodcroft, E., Fellay, J., & Wilder-Smith, A. (2020). COVID-19 epidemic in Switzerland: On the importance of testing, contact tracing and isolation. Swiss Medical Weekly, 150, w20225. https://doi.org/10.4414/smw.2020.20225
Shabani, M., Borry, P., & Gordo, M. (2019). Challenges of genomic data sharing: Ethical, legal and social considerations. Human Genomics, 13(1), 1–12. https://doi.org/10.1186/s40246-019-0198-2
Sheller, M. J., Edwards, B., Reina, G. A., Martin, J., Pati, S., Kotrotsou, A., ... & Bakas, S. (2020). Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Scientific Reports, 10(1), 12598. https://doi.org/10.1038/s41598-020-69250-1
Sheller, M. J., Reina, G. A., Edwards, B., Martin, J., & Bakas, S. (2020). Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Scientific Reports, 10(1), 12598. https://doi.org/10.1038/s41598-020-69250-1
Shi, Y., Cao, Y., Zhang, Q., Li, Y., & Xu, L. D. (2020). Edge computing: Vision and challenges. IEEE Internet of Things Journal, 3(5), 637–646. https://doi.org/10.1109/JIOT.2016.2579198
Shin, D. (2020). Demystifying “hackers”: The social context of cybersecurity vulnerability. Computers in Human Behavior, 103, 233–243. https://doi.org/10.1016/j.chb.2019.09.010
Shokri, R., & Shmatikov, V. (2015). Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (pp. 1310–1321). https://doi.org/10.1145/2810103.2813687
Shokri, R., Stronati, M., Song, C., & Shmatikov, V. (2017). Membership inference attacks against machine learning models. 2017 IEEE Symposium on Security and Privacy (SP), 3–18. https://doi.org/10.1109/SP.2017.41
Sun, L., Yang, X., Wang, L., & Meng, W. (2021). Privacy-preserving medical data sharing and analytics in IoMT systems. IEEE Internet of Things Journal, 8(3), 1684–1695. https://doi.org/10.1109/JIOT.2020.3027589
Tjoa, E., & Guan, C. (2020). A survey on explainable artificial intelligence (XAI): Toward medical XAI. IEEE Transactions on Neural Networks and Learning Systems, 32(11), 4793–4813. https://doi.org/10.1109/TNNLS.2020.3027314
Toth, R., & Marada, M. (2019). GDPR and the role of privacy by design. Journal of Cyber Security Technology, 3(4), 195–205. https://doi.org/10.1080/23742917.2019.1679983
Tramèr, F., & Boneh, D. (2020). Slalom: Fast, verifiable and private execution of neural networks in trusted hardware. Advances in Neural Information Processing Systems, 33, 16560–16572. https://arxiv.org/abs/1806.03287
Truex, S., Baracaldo, N., Anwar, A., Steinke, T., Ludwig, H., Zhang, R., & Zhou, Y. (2019). A hybrid approach to privacy-preserving federated learning. Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, 1–11. https://doi.org/10.1145/3356401.3380292
Vayena, E., Blasimme, A., & Cohen, I. G. (2018). Machine learning in medicine: Addressing ethical challenges. PLoS Medicine, 15(11), e1002689. https://doi.org/10.1371/journal.pmed.1002689
Vayena, E., Salathé, M., Madoff, L. C., & Brownstein, J. S. (2018). Ethical challenges of big data in public health. PLoS Computational Biology, 14(2), e1006280. https://doi.org/10.1371/journal.pcbi.1006280
Voigt, P., & Von dem Bussche, A. (2017). The EU General Data Protection Regulation (GDPR): A Practical Guide, 1st Ed. Springer. https://doi.org/10.1007/978-3-319-57959-7
Wagh, S., Setty, S., & Jha, S. (2020). Secure multi-party analytics over vertically partitioned data. Proceedings on Privacy Enhancing Technologies, 2020(1), 202–222. https://doi.org/10.2478/popets-2020-0009
Wang, Y. X., Balle, B., & Kasiviswanathan, S. P. (2019). Subsampled Rényi differential privacy and analytical moments accountant. Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 1226–1235. https://arxiv.org/abs/1808.00087
Wang, Y., Yurochkin, M., Sun, S., Papailiopoulos, D., & Khazaeni, Y. (2019). Federated learning with matched averaging. International Conference on Learning Representations. https://arxiv.org/abs/2002.06440
Xu, J., Glicksberg, B. S., Su, C., Walker, P., Bian, J., & Wang, F. (2019). Federated learning for healthcare informatics. Journal of Healthcare Informatics Research, 5(1), 1–19. https://doi.org/10.1007/s41666-020-00080-8
Xu, L., Skoularidou, M., Cuesta-Infante, A., & Veeramachaneni, K. (2019). Modeling tabular data using conditional GAN. Advances in Neural Information Processing Systems, 32, 7335–7345. https://arxiv.org/abs/1907.00503
Yang, Q., Liu, Y., Chen, T., & Tong, Y. (2019). Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology, 10(2), Article 12. https://doi.org/10.1145/3298981
Zhou, J., Wu, J., Ding, Z., & Lin, Y. (2019). Understanding user trust in healthcare artificial intelligence applications. International Journal of Medical Informatics, 129, 185–193. https://doi.org/10.1016/j.ijmedinf.2019.06.011

Explore Our Journals
Find the most suitable journal for your research. If this journal does not fully align with the scope of your manuscript, we invite you to explore our wider portfolio of journals covering diverse fields of study. Please select one of the journals below to identify the most appropriate publication platform for your work.