A survey on privacy issues and mitigation strategies for LLMs in healthcare

Khalid A. Alissa

doi:10.1007/s11227-025-08146-1

A survey on privacy issues and mitigation strategies for LLMs in healthcare

Khalid A. Alissa^*

^*Corresponding author for this work

Networks and Communications Department

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Large language models (LLMs) are increasingly being adopted these days in sensitive domains such as healthcare, raising pressing concerns over data privacy and security. Due to training on extremely large and often uncurated datasets, these models tend to memorize and reproduce sensitive data that create significant ethical, legal, and regulatory risks. Considering hundreds of billions of parameters, LLMs training also need high-performance computing capabilities to deliver real-time performance. This work explores the key privacy and adversarial attacks in healthcare LLMs such as membership inference, gradient leakage, model inversion, jailbreaking, backdoors, and prompt injection. We also discuss existing mitigation approaches like differential privacy, federated learning, and knowledge unlearning. Unlike previous surveys that treat privacy and performance separately, this study consolidates algorithmic and compliance-based defenses into a unified healthcare-specific framework. Our proposed privacy-focused LLM framework brings together federated learning, differential privacy, and knowledge unlearning in a high-performance computing environment to address the needs for scalability and compliance in healthcare, along with secure multimodal processing.

Original language	English
Article number	26
Journal	Journal of Supercomputing
Volume	82
Issue number	1
DOIs	https://doi.org/10.1007/s11227-025-08146-1
State	Published - Jan 2026

Keywords

Data memorization
Healthcare
HPC
Machine learning
Model security
Privacy risks

Access to Document

10.1007/s11227-025-08146-1

Cite this

@article{bf703d38ac294f089ab3c756b350a97a,

title = "A survey on privacy issues and mitigation strategies for LLMs in healthcare",

abstract = "Large language models (LLMs) are increasingly being adopted these days in sensitive domains such as healthcare, raising pressing concerns over data privacy and security. Due to training on extremely large and often uncurated datasets, these models tend to memorize and reproduce sensitive data that create significant ethical, legal, and regulatory risks. Considering hundreds of billions of parameters, LLMs training also need high-performance computing capabilities to deliver real-time performance. This work explores the key privacy and adversarial attacks in healthcare LLMs such as membership inference, gradient leakage, model inversion, jailbreaking, backdoors, and prompt injection. We also discuss existing mitigation approaches like differential privacy, federated learning, and knowledge unlearning. Unlike previous surveys that treat privacy and performance separately, this study consolidates algorithmic and compliance-based defenses into a unified healthcare-specific framework. Our proposed privacy-focused LLM framework brings together federated learning, differential privacy, and knowledge unlearning in a high-performance computing environment to address the needs for scalability and compliance in healthcare, along with secure multimodal processing.",

keywords = "Data memorization, Healthcare, HPC, Machine learning, Model security, Privacy risks",

author = "Alissa, \{Khalid A.\}",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.",

year = "2026",

month = jan,

doi = "10.1007/s11227-025-08146-1",

language = "English",

volume = "82",

journal = "Journal of Supercomputing",

issn = "0920-8542",

number = "1",

}

TY - JOUR

T1 - A survey on privacy issues and mitigation strategies for LLMs in healthcare

AU - Alissa, Khalid A.

N1 - Publisher Copyright: © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.

PY - 2026/1

Y1 - 2026/1

N2 - Large language models (LLMs) are increasingly being adopted these days in sensitive domains such as healthcare, raising pressing concerns over data privacy and security. Due to training on extremely large and often uncurated datasets, these models tend to memorize and reproduce sensitive data that create significant ethical, legal, and regulatory risks. Considering hundreds of billions of parameters, LLMs training also need high-performance computing capabilities to deliver real-time performance. This work explores the key privacy and adversarial attacks in healthcare LLMs such as membership inference, gradient leakage, model inversion, jailbreaking, backdoors, and prompt injection. We also discuss existing mitigation approaches like differential privacy, federated learning, and knowledge unlearning. Unlike previous surveys that treat privacy and performance separately, this study consolidates algorithmic and compliance-based defenses into a unified healthcare-specific framework. Our proposed privacy-focused LLM framework brings together federated learning, differential privacy, and knowledge unlearning in a high-performance computing environment to address the needs for scalability and compliance in healthcare, along with secure multimodal processing.

AB - Large language models (LLMs) are increasingly being adopted these days in sensitive domains such as healthcare, raising pressing concerns over data privacy and security. Due to training on extremely large and often uncurated datasets, these models tend to memorize and reproduce sensitive data that create significant ethical, legal, and regulatory risks. Considering hundreds of billions of parameters, LLMs training also need high-performance computing capabilities to deliver real-time performance. This work explores the key privacy and adversarial attacks in healthcare LLMs such as membership inference, gradient leakage, model inversion, jailbreaking, backdoors, and prompt injection. We also discuss existing mitigation approaches like differential privacy, federated learning, and knowledge unlearning. Unlike previous surveys that treat privacy and performance separately, this study consolidates algorithmic and compliance-based defenses into a unified healthcare-specific framework. Our proposed privacy-focused LLM framework brings together federated learning, differential privacy, and knowledge unlearning in a high-performance computing environment to address the needs for scalability and compliance in healthcare, along with secure multimodal processing.

KW - Data memorization

KW - Healthcare

KW - HPC

KW - Machine learning

KW - Model security

KW - Privacy risks

UR - https://www.scopus.com/pages/publications/105026016968

U2 - 10.1007/s11227-025-08146-1

DO - 10.1007/s11227-025-08146-1

M3 - Article

AN - SCOPUS:105026016968

SN - 0920-8542

VL - 82

JO - Journal of Supercomputing

JF - Journal of Supercomputing

IS - 1

M1 - 26

ER -

A survey on privacy issues and mitigation strategies for LLMs in healthcare

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this