Vaikora › Blog › Compliance & Audit
HIPAA and AI: Protecting PHI in Healthcare AI Systems
HIPAA compliance for AI systems means preventing Protected Health Information (PHI) from flowing into third-party LLMs or being logged in uncontrolled contexts. The Health Insurance Portability and Accountability Act requires covered entities and business associates to implement safeguards ensuring confidentiality, integrity, and availability of ePHI. When healthcare organizations deploy large language models, patient data must be treated with the same rigor as traditional systems. This includes audit trails for every action, runtime policy enforcement before data moves, explicit consent workflows, and technical controls that stop unauthorized transmission before it happens.
Why Healthcare Organizations Are Moving to AI, and What Goes Wrong
Healthcare systems have concrete reasons to adopt AI. Radiology departments use AI to flag abnormalities in imaging scans. Clinical documentation systems generate summaries from dictations. Insurance workflows automate prior-authorization reviews. Scheduling systems predict no-shows and optimize patient flow. All of these add value. But each one handles patient names, dates of birth, medical history, insurance IDs, lab results, and diagnoses. That data is PHI under HIPAA, and it needs protection the moment it enters an AI system.
The compliance gap emerges because LLM vendors are not HIPAA-covered entities. OpenAI, Anthropic, Google, and others do not hold Business Associate Agreements (BAAs) that allow them to receive PHI. If you send patient data to a general-purpose LLM endpoint without controls, you are disclosing PHI to an entity with no legal obligation to protect it. That is a breach, regardless of whether the vendor stores it, trains on it, or logs it. The violation happens at transmission.
Healthcare organizations also face a secondary risk: the "accidental exposure" scenario. A developer deploys a healthcare AI application, assumes the LLM endpoint is safe, and later discovers that model vendor retention policies mean prompts are stored for up to 30 days for abuse detection. Or a chatbot trained on clinical notes inadvertently reproduces exact patient records when prompted with partial medical histories. Or an AI-powered prior-authorization tool sends full claim details to a third-party LLM, which logs them in training data. These incidents are auditable only if you have runtime visibility into every AI action.
HIPAA's Core Requirements for AI Systems
HIPAA is not a technology standard. It is a compliance framework centered on three principles: the Privacy Rule, the Security Rule, and the Breach Notification Rule.
The Privacy Rule governs how PHI can be used and disclosed. It requires the "minimum necessary" standard: use or disclose only the amount of PHI needed to accomplish the intended purpose. If your healthcare AI system needs only a patient name and lab result to generate a clinical summary, it should not receive the patient's full medical history, employment status, or insurance details. Most organizations violate this rule by default, sending entire patient records to LLM endpoints out of convenience.
The Security Rule mandates technical, administrative, and physical safeguards for ePHI. This includes access controls (who can access what), audit controls (logging all access), integrity controls (detecting unauthorized changes), and transmission security (encrypting data in motion). The Security Rule does not prescribe specific technologies. It requires a risk assessment, a written security plan, and documented evidence that controls are in place and working. For AI systems, this means your runtime policy layer, audit logging, and data routing must be part of your documented security architecture.
The Breach Notification Rule requires notification of affected individuals if PHI is accessed or transmitted without authorization. The threshold is simple: if someone other than an authorized user obtained PHI, assume it is a breach unless you have evidence otherwise. This applies to LLM vendors, data processors, and cloud platforms.
Healthcare AI compliance also depends on whether your organization qualifies as a "covered entity" or "business associate." A covered entity (hospitals, health plans, healthcare clearinghouses) has direct HIPAA obligations. A business associate (EHR vendors, billing processors, cloud providers) has BAA-derived obligations and must comply on behalf of the covered entity. If you are a healthcare software vendor, you are likely a business associate, and every customer contract must include a BAA clause that addresses AI processing.
The Minimum Necessary Standard and LLM Prompts
The minimum necessary principle is where most healthcare AI deployments fail. Here is the practical challenge: to generate a meaningful clinical summary, the LLM needs context. But HIPAA says you should limit that context to what is necessary.
A concrete example: a hospital uses an AI system to draft clinical documentation from voice dictations. The physician says, "Patient presents with shortness of breath, chest pain, and fever." The AI summarizes this as a complaint of breath, chest discomfort, and elevated temperature. Then the system asks an LLM to draft a formal assessment. The minimum necessary data is: the chief complaint and vital signs. The system does not need to send the patient's full 40-year medical history, pharmacy records, lab values from three years ago, or imaging reports from other encounters. Yet many AI systems default to sending the entire patient record because the developers assume "more context = better output."
To enforce minimum necessary, you need:
- A data classification and tagging layer that identifies what qualifies as PHI in your prompts.
- A policy engine that blocks or redacts fields before they reach the LLM.
- Audit logging that records which data fields were sent, when, and why.
A runtime policy layer deployed as a proxy or sidecar can inspect every LLM prompt before transmission, enforce policies that say "remove date of birth," "redact insurance ID," or "block entire patient histories," and log every decision into tamper-proof storage. This approach satisfies the minimum necessary requirement because you have explicit, logged evidence that unnecessary fields never reached the LLM vendor.
Audit Trails and Accountability
HIPAA's Security Rule requires audit controls: "Implement hardware, software, and procedural mechanisms that record and examine activity in information systems containing or using ePHI." This is not optional guidance. It is a mandatory control.
For AI systems, audit means: every time PHI is processed by an AI component, you need a record that includes the user identity, the timestamp, the data accessed, the action performed (e.g., "sent to LLM for summarization"), and the outcome. If a clinical note is summarized by an LLM, you need to log: who initiated the action, which patient record was accessed, the specific fields sent to the LLM, the LLM response, and whether any policy violations occurred.
The audit trail also serves compliance investigations. If a breach occurs, regulators will ask: who accessed the data, what was the purpose, what controls were in place, and when did the access happen. Without detailed audit logs, you cannot answer those questions. You cannot prove you enforced the minimum necessary standard. You cannot demonstrate that you detected the breach quickly. You may face penalties.
Many organizations audit at the application level (logging LLM API calls) but fail to audit at the policy level (logging which fields were allowed or blocked by the policy engine). This creates a blind spot. If an unauthorized field reaches the LLM, the application log shows the API call, but no record exists of the policy decision that allowed it. A runtime policy layer with built-in audit logging closes this gap.
Safe Harbor and De-Identification
HIPAA includes a safe harbor mechanism. If you de-identify patient data according to the Safe Harbor standard, it is no longer PHI, and you can use it without restrictions, including feeding it to LLMs for training or analysis.
Safe Harbor de-identification requires removal of 18 specific identifiers: name, medical record number, account number, Social Security number, date of birth, admission and discharge dates, dates of service, phone number, fax number, email address, vehicle identifiers, device serial numbers, web URLs, IP addresses, biometric records, biometric templates, photographs, and any other unique identifier. You can retain age (in years), year of birth, dates of service (except exact admission/discharge), and geographic information to the state level.
The practical implication: if your healthcare AI system processes de-identified data only, you eliminate HIPAA compliance risk. A radiology AI model trained on 100,000 chest X-rays with all patient identifiers removed is not subject to HIPAA, because the data is not PHI. However, de-identification is a one-way process. Once you remove the identifiers, you cannot re-link the data back to patients for clinical use. This works for population-level analytics and research, but not for real-time clinical decision support.
For real-time AI in healthcare, you will process identified PHI. In those cases, runtime policy enforcement becomes critical. You cannot rely on de-identification to solve the problem.
Real-Time Policy Enforcement and Deterministic Decisions
The key technical innovation for healthcare AI compliance is deterministic runtime policy. Before an LLM receives any prompt, a policy engine evaluates it against a ruleset and makes a decision: ALLOW, BLOCK, CONSTRAIN, or LOG. This decision must be made before the LLM processes the data.
A concrete policy might be:
- ALLOW: Send clinical note summaries to an LLM if they contain only chief complaint, vital signs, and assessment impression.
- BLOCK: Reject any prompt containing full pharmaceutical history, genetic information, or psychiatric records without explicit approval.
- CONSTRAIN: If a prompt contains a Social Security number, hash it and replace with a pseudonym before sending.
- LOG: Record all decisions for audit.
The ALLOW decision should be fast (under 500ms) and deterministic (same input, same decision every time). This removes human error from the compliance process. Developers cannot accidentally send a full patient record. Policies are enforced consistently across all users and all systems.
Business Associate Agreements and Third-Party LLM Vendors
If your healthcare AI system uses a third-party LLM (OpenAI, Anthropic, Google, etc.), you need a Business Associate Agreement with that vendor. The BAA must require the vendor to implement the same HIPAA safeguards as your organization. Specifically, it should require:
- Encryption of ePHI in transit and at rest.
- Access controls limiting access to authorized personnel only.
- Audit logging for all access.
- Notification requirements if a breach occurs.
- Prohibition on re-using or re-disclosing PHI (except as required by law or the BAA).
- Right to audit and inspect.
Most commercial LLM vendors do not offer BAAs. OpenAI's ChatGPT Enterprise now offers BAA support for healthcare customers. Google Vertex AI and Microsoft Azure OpenAI offer BAA-compliant options with safeguards: data exclusion from model training, configurable log retention based on your policy, and organization-scoped access. These options add value for healthcare, but cost more and may have higher latency compared to public endpoints.
If you cannot obtain a BAA from your LLM vendor, you have two options: (1) use a different vendor with BAA support, or (2) de-identify all data before sending it to the LLM. Option 2 avoids the BAA requirement but prevents real-time patient-specific AI.
Implementing Compliant Healthcare AI Workflows
A reference architecture for HIPAA-compliant AI looks like this:
-
Data Classification: Tag all data in the EHR or clinical system as PHI or non-PHI. Include metadata about sensitivity (e.g., "psychiatric," "HIV status," "genetic").
-
Policy Definition: Write explicit policies defining what data can flow to AI systems and under what conditions. Example: "Diabetes prediction models may access age, BMI, blood glucose, and HbA1c; psychiatry and genetics data are blocked."
-
Runtime Policy Engine: Deploy a policy layer (often a proxy or sidecar) that inspects every request before it reaches the LLM. Block, allow, constrain, or log based on policy.
-
Audit Logging: Log every policy decision, including the data fields, the decision (ALLOW/BLOCK/etc.), and the timestamp. Store logs in tamper-proof storage for at least 6 years (HIPAA retention requirement).
-
Breach Detection: Monitor logs for policy violations (e.g., attempts to send blocked data). Alert security teams immediately.
-
Vendor Management: Ensure all LLM vendors have signed BAAs and comply with audit logging requirements. Conduct periodic risk assessments.
-
Employee Training: Train clinical and IT staff on HIPAA requirements for AI. Include incident response procedures if a breach occurs.
Healthcare organizations often implement steps 1 and 2 but skip steps 3 through 6. This creates compliance exposure. A policy on paper is worthless if no technical enforcement layer prevents violation. Runtime policy engines close this gap by automating compliance.
Regulatory Trends and Future Compliance
Regulators are paying attention to healthcare AI. The FDA has published guidance on AI in medical devices. The Office for Civil Rights (OCR), which enforces HIPAA, has not yet issued specific regulations on LLMs, but enforcement actions are likely as breaches are discovered. Several healthcare organizations have already faced OCR audits after disclosing AI-related data handling practices.
The EU AI Act, which takes effect in 2026, classifies certain healthcare AI as "high risk" and requires documented risk assessments, quality management systems, and human oversight. This is stricter than current U.S. law, but other jurisdictions may follow. Organizations operating in multiple regions should design for the strictest regime (EU AI Act + HIPAA) and apply those controls globally.
Reference Architecture: Open-Core Implementation
The open-core approach to runtime policy combines a gateway and policy engine that inspect every prompt before transmission, apply policies (allow, block, constrain, redact), and log every decision into tamper-proof storage. This addresses the minimum necessary requirement, audit logging requirement, and breach detection in a single system. MCP servers can integrate runtime policy into existing AI agents and workflows without rewriting application code. Commercial offerings add pre-built compliance templates (HIPAA, GDPR, SOC 2), policy exception approvals, and audit dashboards for oversight teams.
Frequently asked questions
Can healthcare AI systems violate HIPAA?
Yes. If a healthcare AI system sends PHI to an LLM endpoint without a signed Business Associate Agreement, it violates the Privacy Rule. If the system fails to log access to PHI, it violates the Security Rule. HIPAA violations can result in penalties of $100 to $50,000 per incident, and OCR investigations can take months or years. Any healthcare organization deploying AI must conduct a privacy impact assessment before going live.
How do you keep PHI out of LLM prompts?
Deploy a runtime policy engine that inspects every LLM prompt before transmission. Tag all data as PHI or non-PHI in your source systems. Write explicit policies defining which data fields are allowed for each AI use case. The policy engine blocks, redacts, or constrains fields before they leave your network. Log all policy decisions for audit. Regular testing should verify that blocks are working and that sensitive data never reaches the LLM.
What does HIPAA require for AI systems processing patient data?
HIPAA requires the Privacy Rule (minimum necessary standard), Security Rule (audit logging and access controls), and Breach Notification Rule (reporting breaches within 60 days). For AI specifically, this means: obtain informed consent from patients, sign BAAs with LLM vendors, implement audit logging, classify and tag PHI, enforce runtime policies, and conduct risk assessments. Document all controls and be prepared to demonstrate compliance to regulators.
Are LLM applications subject to HIPAA requirements?
Yes, if they process PHI from a healthcare organization. HIPAA applies to covered entities (hospitals, health plans, healthcare providers) and business associates (vendors processing data on their behalf). If your LLM application processes patient data from a healthcare customer, you must comply with HIPAA. This includes providing BAA agreements and audit reports to your customers.
What is the difference between de-identification and pseudonymization for HIPAA?
De-identification under the Safe Harbor standard removes 18 specific identifiers, and the data is no longer PHI. You can use it freely, including for training LLMs. Pseudonymization replaces identifiers with codes or hashes but retains the ability to re-link data to patients. Pseudonymized data is still PHI under HIPAA and requires the same protections as identified data. De-identification is the stronger privacy approach but prevents real-time patient-specific AI use cases.
How long must healthcare AI audit logs be retained?
HIPAA requires retention of audit logs for at least 6 years from creation or last modification. In practice, healthcare organizations retain logs for 7 to 10 years to account for investigation timelines. If you are using a third-party log storage provider (cloud logging), ensure your contract specifies retention periods and includes provisions for law enforcement requests and breach notification.
What is a Business Associate Agreement and why is it required for LLM vendors?
A Business Associate Agreement (BAA) is a contract between a covered entity (e.g., a hospital) and a vendor that processes PHI on the covered entity's behalf. The BAA requires the vendor to implement HIPAA safeguards and prohibits re-use or re-disclosure of PHI. If your LLM vendor processes patient data, they must sign a BAA. Most general-purpose LLM vendors (OpenAI, Google Vertex AI base tier) do not offer BAAs. Azure OpenAI and some specialized healthcare AI vendors do offer BAA-compliant options.
How do you balance AI accuracy with HIPAA compliance?
Compliance and accuracy are not mutually exclusive. De-identification improves accuracy in some cases by removing noisy identifiers. Enforcing minimum necessary standards reduces overfitting by focusing models on clinically relevant features. However, redacting or constraining certain fields (e.g., psychiatric history) may reduce performance for some AI models. Conduct trade-off analyses during the design phase and document which accuracy compromises are worth the compliance gain.
What should be in a healthcare AI privacy impact assessment?
A privacy impact assessment should identify all PHI processed by the AI system, describe the business purpose, document the policy controls, assess risks, and describe mitigation strategies. Specifically, address: Who accesses the data? What is the consent mechanism? Is there a BAA with the LLM vendor? Are audit logs retained for 6+ years? What is the breach notification plan? Who is responsible for compliance? Conduct the assessment before deployment and update it annually or after significant changes.
What are the penalties for HIPAA violations involving AI?
OCR can assess civil penalties ranging from $100 to $50,000 per violation (45 CFR §160.404). For a breach affecting multiple patients, OCR escalates penalties based on the severity and scope of the violation, with civil penalties capped at $50,000 per violation. Covered entities and business associates also face reputational damage, customer churn, and increased insurance premiums. State attorneys general can pursue additional penalties under state privacy laws. Criminal penalties (fines up to $250,000 and imprisonment up to 10 years) apply to knowing violations (18 U.S.C. §1307).
See Vaikora enforce policy on your AI
Open-core AI runtime control. Self-host the MIT gateway free, or run the hosted Control Plane.
Get a demo Self-host the gateway
Vaikora