01. PHI Risk Classification by LLM Use Case
Not all healthcare LLM use cases carry equal HIPAA exposure. The risk level depends on whether PHI enters the LLM prompt directly, whether output contains derivable PHI, and what downstream systems consume the response. Classify your use case before building.
| Use Case | PHI Exposure | Risk Level | Key Requirements |
|---|---|---|---|
| Clinical note summarization / transcription | Direct PHI in prompt: patient name, DOB, diagnoses, medications, provider notes | Critical | BAA with LLM vendor, encryption in transit, minimum necessary scrubbing, full audit log |
| Patient intake and triage chatbots | Symptoms, insurance IDs, demographics collected in real time | Critical | BAA required, access controls, automatic session termination, PHI not stored in prompt history |
| Diagnostic decision support | Lab results, imaging findings, clinical history may be provided | High | BAA, de-identification before inference where possible, audit trail on every call |
| Prior authorization and billing assistance | Diagnosis codes, procedure codes, payer IDs, patient insurance data | High | BAA with LLM vendor and RCM platform, encryption, access controls |
| Administrative assistants (scheduling, referrals) | Patient name, contact info, appointment type | Medium | BAA with scheduling platform and LLM vendor, limited PHI scope in prompts |
| De-identified population analytics | No direct PHI if properly de-identified per HIPAA Safe Harbor (18 identifiers removed) | Low | Verify de-identification meets HIPAA standard before routing to LLM; no BAA required if truly de-identified |
If your LLM uses retrieval-augmented generation against an EHR, clinical data warehouse, or claims database -- every retrieval query may contain or derive PHI. Treat the entire RAG pipeline as a HIPAA-covered system, not just the final LLM call.
The 18 HIPAA Safe Harbor identifiers that must be removed before data is considered de-identified include: names, geographic data smaller than state, dates (except year), phone/fax, email, SSN, medical record numbers, health plan beneficiary numbers, account numbers, certificate/license numbers, vehicle identifiers, device identifiers, URLs, IP addresses, biometric identifiers, full-face photos, and any other unique identifying numbers.
02. BAA Requirements for LLM Vendors
A Business Associate Agreement is a contract that requires your LLM vendor -- as a Business Associate -- to protect PHI they receive on your behalf. Under HIPAA, you cannot transmit PHI to any third-party vendor without a signed BAA. This includes LLM API providers, vector database vendors, embedding services, and cloud infrastructure providers.
Who Requires a BAA
Several LLM providers only offer HIPAA BAAs on enterprise contracts. If you are using a standard API key on a developer or team plan, PHI transmission may not be covered even if a BAA template exists on the vendor's website. Confirm tier eligibility in writing.
What a Valid BAA Must Include
A compliant BAA must address: permitted uses and disclosures of PHI by the Business Associate, prohibition on using PHI for the BA's own purposes beyond the contracted service, requirements to implement HIPAA Security Rule safeguards, obligation to report breaches within 60 days of discovery, return or destruction of PHI upon contract termination, and flow-down requirements to any subcontractors (subprocessors).
Vendor terms of service are not a BAA. Privacy policies, data processing addenda, and acceptable use policies do not satisfy the BAA requirement. You need a signed agreement that uses the statutory language from 45 CFR 164.504(e).
03. Minimum Necessary Standard for LLM Prompts
HIPAA's minimum necessary standard (45 CFR 164.502(b)) requires that covered entities and business associates limit PHI use and disclosure to the minimum amount necessary to accomplish the intended purpose. Applied to LLMs: every prompt sent to an LLM inference endpoint must contain only the PHI required for that specific clinical function -- nothing more.
The Problem: LLMs Receive More PHI Than They Need
In practice, healthcare applications often pass full patient context to LLMs -- complete clinical notes, full EHR export, entire conversation histories -- because it is simpler to build and produces better outputs. This violates the minimum necessary standard even when the information is technically relevant. The question is not "could this help the LLM?" but "is all of this necessary for the specific function?"
Implementing PHI Minimization Before Inference
Minimum necessary enforcement requires a step between your application and the LLM API call:
# Minimum necessary enforcement pipeline # Step 1: Identify what the LLM function actually needs required_phi_fields = { "clinical_note_summary": ["chief_complaint", "assessment", "plan"], "prior_auth": ["diagnosis_code", "procedure_code", "clinical_justification"], "patient_intake": ["chief_complaint", "symptoms", "duration"] } # Step 2: Strip identifiers not needed for the function # Patient name, DOB, MRN, contact info are NOT needed for most clinical summaries prompt = build_prompt( phi_fields=required_phi_fields[use_case], replace_identifiers={ "patient_name": "[PATIENT]", # substitute token "dob": "[DOB]", "mrn": "[MRN]", "provider_name": "[PROVIDER]" } ) # Step 3: Scan for residual PHI before sending # A policy layer catches what field-stripping misses result = sentinelgate.chat_completions(prompt, policy="hipaa-minimum-necessary")
Pseudo-Anonymization vs. True De-identification
Substituting tokens (replacing a patient name with "[PATIENT]") is pseudo-anonymization -- it reduces the information content of the prompt but does not produce a de-identified record under HIPAA. PHI is still present in your system at the substitution mapping layer, and the output may still allow re-identification if the LLM response references the substituted context.
True HIPAA de-identification removes all 18 identifiers such that the probability of identifying the individual is very small. For most LLM use cases in active patient care, true de-identification is not achievable -- you need the clinical context. Use pseudo-anonymization to minimize identifiers in the prompt and rely on technical safeguards (BAA, encryption, audit trail) to protect the remainder.
If an LLM response contains PHI -- synthesized, echoed, or derived -- and that response is stored, displayed to a user, or passed to another system, the minimum necessary standard applies to how you handle the output. Log PHI in outputs with appropriate access controls; do not store LLM responses in application logs without PHI scrubbing.
04. HIPAA Security Rule: Technical Safeguards for LLM Architecture
The HIPAA Security Rule (45 CFR Part 164, Subpart C) requires covered entities and business associates to implement technical safeguards for electronic PHI. For LLM-powered applications, these requirements map to specific architectural controls.
Administrative Safeguards (45 CFR 164.308)
| Requirement | LLM Architecture Implementation | Type |
|---|---|---|
| Security Management Process | Risk analysis covering LLM vendor BAA coverage, prompt injection vectors, PHI in training/fine-tuning data, and model output leakage risks | Required |
| Workforce Training | Document what PHI may be sent to LLMs, which vendors have BAAs, and what constitutes a reportable incident involving LLM + PHI | Required |
| Access Management | API key access controls -- each role or service gets its own key with least-privilege scoping. No shared keys across environments | Required |
| Contingency Plan | What happens if the LLM vendor is unavailable? Document fallback procedures that do not rely on LLM for critical patient care functions | Required |
Physical Safeguards (45 CFR 164.310)
For cloud-deployed LLM applications, physical safeguards are largely handled by your cloud provider (under your BAA). Your obligations:
- Document the physical location of servers processing PHI (region, data center)
- Ensure your BAA specifies the vendor's physical security controls
- Restrict PHI processing to approved geographic regions (some healthcare contracts require US-only)
Technical Safeguards (45 CFR 164.312)
05. Audit Logging Requirements for Healthcare LLMs
HIPAA Security Rule 164.312(b) requires audit controls -- not optional, not "if practical." Every action on a system that stores or processes PHI must be recorded. For LLM-powered healthcare applications, this means logging every inference call with sufficient detail to reconstruct what happened and who had access to what PHI.
What Must Be Logged on Every LLM Call
// HIPAA audit event -- SentinelGate captures this on every call { // Identity and timing "event_id": "evt_01JXM...", "occurred_at": "2026-05-08T14:33:21.847Z", "api_key_id": "key_...abc", "user_session": "sess_...xyz", // PHI classification "phi_detected": true, "phi_categories": ["diagnosis", "medication"], "phi_action": "allowed_with_redaction", // Policy and compliance "policy_version": "hipaa-v2.1", "baa_vendor": "openai", "baa_active": true, "model": "gpt-4o", // Output scan "output_phi_scan": "clean", "latency_ms": 412, "tokens_in": 847, "tokens_out": 294, // Retention classification "hipaa_retention": "6yr_minimum", "data_region": "US-EAST" }
Retention and Access for HIPAA Audit Logs
HIPAA requires that documentation (including audit logs) be retained for 6 years from creation or last effective date. Some states impose longer requirements (California: 10 years for minor patients). Logs must be accessible for OCR inspection upon request.
Access controls for audit logs: read access must be restricted to authorized personnel (compliance, security, clinical informatics). Write access must be append-only -- no modification or deletion of existing audit records. Implement cryptographic integrity verification (hash chaining or signed records) to demonstrate tamper-evidence.
06. Breach Notification When LLMs Process PHI
HIPAA Breach Notification Rule (45 CFR Part 164, Subpart D) requires covered entities to notify affected individuals, HHS, and potentially media when unsecured PHI is breached. For LLM-powered applications, specific scenarios trigger breach notification obligations.
LLM-Specific Breach Scenarios
Breach Notification Timeline
Once a breach is discovered:
- Within 60 days: Notify affected individuals and HHS Secretary
- If 500+ individuals in a state: Also notify prominent media outlets in that state
- If less than 500 individuals: Can aggregate into annual report to HHS (but individual notification still within 60 days)
- Business Associate breaches: BA must notify covered entity within 60 days of discovery -- your BAA should specify this timeline
Internal "anonymization" that does not satisfy HIPAA Safe Harbor (all 18 identifiers removed) or Expert Determination (qualified expert certifies re-identification risk is very small) does not exempt you from breach notification. Partial anonymization is not de-identification.
07. SentinelGate: HIPAA Technical Safeguards in One Layer
Building HIPAA-compliant LLM infrastructure from scratch means implementing PHI detection, minimum necessary enforcement, audit logging, access controls, and policy management -- for every LLM call, in real time, before inference. SentinelGate handles all of this in a single proxy layer that sits between your application and any LLM API.
+ How SentinelGate Covers HIPAA Technical Safeguards
One base URL change routes all LLM traffic through SentinelGate. Every call is scanned for PHI, evaluated against your configured policies, logged with the complete HIPAA audit schema, and governed in real time -- with zero changes to your application code.
Integration: Change One Line
Add SentinelGate to any healthcare LLM application without touching your clinical code. Change the base URL -- every other configuration stays identical.
# Before: direct to LLM vendor (PHI unprotected) OPENAI_API_BASE=https://api.openai.com/v1 # After: SentinelGate policy + audit layer (HIPAA safeguards applied) OPENAI_API_BASE=https://gateway.sentinelgate.polsia.app/v1 OPENAI_API_KEY=your_sentinel_gate_key # SentinelGate signs BAA for enterprise plans # PHI detection, audit logging, minimum necessary enforcement -- automatic
# Python -- clinical note summarization with HIPAA safeguards from openai import OpenAI client = OpenAI( api_key="your_sentinel_gate_key", base_url="https://gateway.sentinelgate.polsia.app/v1" ) response = client.chat.completions.create( model="gpt-4o", messages=[{ "role": "user", "content": "Summarize the following SOAP note: [note content]" }] ) # PHI detection, policy enforcement, and audit log applied automatically # No code changes. No missed calls. No coverage gaps.
Get your free API key -- HIPAA-compliant in 5 minutes
No credit card required. No application code changes. Route your healthcare LLM traffic through SentinelGate and have PHI protection and a complete audit trail before your next sprint ends.
08. Full HIPAA Compliance Checklist for LLM Healthcare Applications
Use this checklist to assess your current compliance posture. Every unchecked item is a gap between your current state and HIPAA compliance.
PHI Classification and Scope
Business Associate Agreements
Minimum Necessary and PHI Minimization
Technical Safeguards (Security Rule 164.312)
Audit Logging (Security Rule 164.312(b))
Breach Preparedness
Inventory every LLM vendor, vector database, and infrastructure provider your application uses. Check whether a BAA is signed and whether it covers your current tier and use case. Unsigned BAAs are the most common and most immediately actionable HIPAA gap in healthcare AI systems.