01. PHI Risk Classification by LLM Use Case

Not all healthcare LLM use cases carry equal HIPAA exposure. The risk level depends on whether PHI enters the LLM prompt directly, whether output contains derivable PHI, and what downstream systems consume the response. Classify your use case before building.

Use CasePHI ExposureRisk LevelKey Requirements
Clinical note summarization / transcription Direct PHI in prompt: patient name, DOB, diagnoses, medications, provider notes Critical BAA with LLM vendor, encryption in transit, minimum necessary scrubbing, full audit log
Patient intake and triage chatbots Symptoms, insurance IDs, demographics collected in real time Critical BAA required, access controls, automatic session termination, PHI not stored in prompt history
Diagnostic decision support Lab results, imaging findings, clinical history may be provided High BAA, de-identification before inference where possible, audit trail on every call
Prior authorization and billing assistance Diagnosis codes, procedure codes, payer IDs, patient insurance data High BAA with LLM vendor and RCM platform, encryption, access controls
Administrative assistants (scheduling, referrals) Patient name, contact info, appointment type Medium BAA with scheduling platform and LLM vendor, limited PHI scope in prompts
De-identified population analytics No direct PHI if properly de-identified per HIPAA Safe Harbor (18 identifiers removed) Low Verify de-identification meets HIPAA standard before routing to LLM; no BAA required if truly de-identified
!
RAG pipelines that retrieve from clinical databases are Critical by default.

If your LLM uses retrieval-augmented generation against an EHR, clinical data warehouse, or claims database -- every retrieval query may contain or derive PHI. Treat the entire RAG pipeline as a HIPAA-covered system, not just the final LLM call.

The 18 HIPAA Safe Harbor identifiers that must be removed before data is considered de-identified include: names, geographic data smaller than state, dates (except year), phone/fax, email, SSN, medical record numbers, health plan beneficiary numbers, account numbers, certificate/license numbers, vehicle identifiers, device identifiers, URLs, IP addresses, biometric identifiers, full-face photos, and any other unique identifying numbers.


02. BAA Requirements for LLM Vendors

A Business Associate Agreement is a contract that requires your LLM vendor -- as a Business Associate -- to protect PHI they receive on your behalf. Under HIPAA, you cannot transmit PHI to any third-party vendor without a signed BAA. This includes LLM API providers, vector database vendors, embedding services, and cloud infrastructure providers.

Who Requires a BAA

Vendors Requiring BAA Before PHI Transmission45 CFR 164.308(b)
LLM API providersOpenAI, Anthropic, Google, Cohere, and any other model API that receives prompts containing PHI. Must sign BAA before first PHI call. Check vendor BAA availability -- some providers only offer BAAs on enterprise plans.
Vector database and embedding servicesIf your RAG pipeline embeds clinical text and stores vectors in Pinecone, Weaviate, Chroma, pgvector, or similar -- the embedding service and the database are processing PHI. BAA required for each.
Cloud infrastructure providersAWS, GCP, Azure, Render, or any hosting provider where PHI is stored or processed. Major cloud providers offer standard BAAs -- execute before deploying PHI workloads.
LLM proxy / API gateway providersIf you route LLM calls through a third-party gateway or compliance layer, that provider is a Business Associate. Obtain a BAA before routing PHI-containing calls through the service.
Logging and observability platformsDatadog, Splunk, Sumo Logic, or any platform receiving application logs that may contain PHI from LLM inputs/outputs. BAA required if PHI can appear in log data.
!
BAA status changes by pricing tier -- verify before sending PHI.

Several LLM providers only offer HIPAA BAAs on enterprise contracts. If you are using a standard API key on a developer or team plan, PHI transmission may not be covered even if a BAA template exists on the vendor's website. Confirm tier eligibility in writing.

What a Valid BAA Must Include

A compliant BAA must address: permitted uses and disclosures of PHI by the Business Associate, prohibition on using PHI for the BA's own purposes beyond the contracted service, requirements to implement HIPAA Security Rule safeguards, obligation to report breaches within 60 days of discovery, return or destruction of PHI upon contract termination, and flow-down requirements to any subcontractors (subprocessors).

Vendor terms of service are not a BAA. Privacy policies, data processing addenda, and acceptable use policies do not satisfy the BAA requirement. You need a signed agreement that uses the statutory language from 45 CFR 164.504(e).


03. Minimum Necessary Standard for LLM Prompts

HIPAA's minimum necessary standard (45 CFR 164.502(b)) requires that covered entities and business associates limit PHI use and disclosure to the minimum amount necessary to accomplish the intended purpose. Applied to LLMs: every prompt sent to an LLM inference endpoint must contain only the PHI required for that specific clinical function -- nothing more.

The Problem: LLMs Receive More PHI Than They Need

In practice, healthcare applications often pass full patient context to LLMs -- complete clinical notes, full EHR export, entire conversation histories -- because it is simpler to build and produces better outputs. This violates the minimum necessary standard even when the information is technically relevant. The question is not "could this help the LLM?" but "is all of this necessary for the specific function?"

Implementing PHI Minimization Before Inference

Minimum necessary enforcement requires a step between your application and the LLM API call:

# Minimum necessary enforcement pipeline
# Step 1: Identify what the LLM function actually needs
required_phi_fields = {
    "clinical_note_summary": ["chief_complaint", "assessment", "plan"],
    "prior_auth":            ["diagnosis_code", "procedure_code", "clinical_justification"],
    "patient_intake":        ["chief_complaint", "symptoms", "duration"]
}

# Step 2: Strip identifiers not needed for the function
# Patient name, DOB, MRN, contact info are NOT needed for most clinical summaries
prompt = build_prompt(
    phi_fields=required_phi_fields[use_case],
    replace_identifiers={
        "patient_name": "[PATIENT]",   # substitute token
        "dob":          "[DOB]",
        "mrn":          "[MRN]",
        "provider_name": "[PROVIDER]"
    }
)

# Step 3: Scan for residual PHI before sending
# A policy layer catches what field-stripping misses
result = sentinelgate.chat_completions(prompt, policy="hipaa-minimum-necessary")

Pseudo-Anonymization vs. True De-identification

Substituting tokens (replacing a patient name with "[PATIENT]") is pseudo-anonymization -- it reduces the information content of the prompt but does not produce a de-identified record under HIPAA. PHI is still present in your system at the substitution mapping layer, and the output may still allow re-identification if the LLM response references the substituted context.

True HIPAA de-identification removes all 18 identifiers such that the probability of identifying the individual is very small. For most LLM use cases in active patient care, true de-identification is not achievable -- you need the clinical context. Use pseudo-anonymization to minimize identifiers in the prompt and rely on technical safeguards (BAA, encryption, audit trail) to protect the remainder.

i
Minimum necessary applies to LLM outputs too.

If an LLM response contains PHI -- synthesized, echoed, or derived -- and that response is stored, displayed to a user, or passed to another system, the minimum necessary standard applies to how you handle the output. Log PHI in outputs with appropriate access controls; do not store LLM responses in application logs without PHI scrubbing.


04. HIPAA Security Rule: Technical Safeguards for LLM Architecture

The HIPAA Security Rule (45 CFR Part 164, Subpart C) requires covered entities and business associates to implement technical safeguards for electronic PHI. For LLM-powered applications, these requirements map to specific architectural controls.

Administrative Safeguards (45 CFR 164.308)

RequirementLLM Architecture ImplementationType
Security Management ProcessRisk analysis covering LLM vendor BAA coverage, prompt injection vectors, PHI in training/fine-tuning data, and model output leakage risksRequired
Workforce TrainingDocument what PHI may be sent to LLMs, which vendors have BAAs, and what constitutes a reportable incident involving LLM + PHIRequired
Access ManagementAPI key access controls -- each role or service gets its own key with least-privilege scoping. No shared keys across environmentsRequired
Contingency PlanWhat happens if the LLM vendor is unavailable? Document fallback procedures that do not rely on LLM for critical patient care functionsRequired

Physical Safeguards (45 CFR 164.310)

For cloud-deployed LLM applications, physical safeguards are largely handled by your cloud provider (under your BAA). Your obligations:

  • Document the physical location of servers processing PHI (region, data center)
  • Ensure your BAA specifies the vendor's physical security controls
  • Restrict PHI processing to approved geographic regions (some healthcare contracts require US-only)

Technical Safeguards (45 CFR 164.312)

Technical Safeguards for LLM Systems164.312
Access Controls (164.312(a)(1))Unique user identification for every person who can access PHI-processing LLM systems. Role-based access -- clinicians, engineers, and compliance officers have different access scopes. Implement automatic logoff after inactivity.
Required
Audit Controls (164.312(b))Hardware, software, and procedural mechanisms to record and examine activity on systems processing PHI. For LLM systems: every call logged with timestamp, user/service identity, policy result, and whether PHI was present. Logs must be tamper-evident and retained per policy.
Required
Integrity Controls (164.312(c)(1))Verify that PHI has not been altered or destroyed in an unauthorized manner. For LLM systems: audit log integrity (append-only or cryptographic), output validation to detect unexpected PHI modification, and signed audit records.
Required
Transmission Security (164.312(e)(1))Implement technical measures to guard against unauthorized access to PHI transmitted over a network. All LLM API calls must use TLS 1.2+ (TLS 1.3 preferred). No PHI over HTTP. Certificate validation enforced -- no SSL verification disabled in production code.
Required
Encryption at Rest (164.312(a)(2)(iv))Addressable requirement: encryption at rest for PHI stored in databases, vector stores, and log archives. AES-256 is the standard. "Addressable" means you must implement it or document why an equivalent alternative provides equal protection -- in practice, regulators expect encryption.
Addressable
๐Ÿ“Š Quick check
Is your LLM infrastructure HIPAA-compliant?
Take the 2-minute AI compliance assessment โ€” get a scored breakdown of your gaps across PHI handling, audit trails, access controls, and incident response.
Take the Assessment โ†’

05. Audit Logging Requirements for Healthcare LLMs

HIPAA Security Rule 164.312(b) requires audit controls -- not optional, not "if practical." Every action on a system that stores or processes PHI must be recorded. For LLM-powered healthcare applications, this means logging every inference call with sufficient detail to reconstruct what happened and who had access to what PHI.

What Must Be Logged on Every LLM Call

HIPAA LLM Audit Event Schema164.312(b)
Timestamp and identityUTC timestamp, user or service account ID, session ID, IP address or network origin, and the LLM function invoked (e.g., "clinical_note_summary", "prior_auth_assist")
PHI presence indicatorWhether PHI was detected in the prompt, what PHI categories were present (name, diagnosis, MRN, etc.), and what policy action was taken (allow with redaction, block, flag for review)
Policy evaluation resultPolicy version applied, rules triggered, action taken, any manual override with override reason and authorizing user ID. Every block or redaction must be traceable to a policy rule.
Vendor and model identityLLM vendor name, model version, API endpoint, and whether the vendor's BAA was in effect at time of call. Critical for breach attribution -- "was this call covered by a BAA?" must be answerable from the log.
Response metadataResponse received (yes/no), output PHI scan result, latency, tokens in/out. Output PHI scanning is required -- LLMs can echo, derive, or hallucinate PHI in responses.
// HIPAA audit event -- SentinelGate captures this on every call
{
  // Identity and timing
  "event_id":       "evt_01JXM...",
  "occurred_at":    "2026-05-08T14:33:21.847Z",
  "api_key_id":     "key_...abc",
  "user_session":   "sess_...xyz",

  // PHI classification
  "phi_detected":   true,
  "phi_categories": ["diagnosis", "medication"],
  "phi_action":     "allowed_with_redaction",

  // Policy and compliance
  "policy_version": "hipaa-v2.1",
  "baa_vendor":     "openai",
  "baa_active":     true,
  "model":          "gpt-4o",

  // Output scan
  "output_phi_scan": "clean",
  "latency_ms":      412,
  "tokens_in":       847,
  "tokens_out":      294,

  // Retention classification
  "hipaa_retention": "6yr_minimum",
  "data_region":     "US-EAST"
}

Retention and Access for HIPAA Audit Logs

HIPAA requires that documentation (including audit logs) be retained for 6 years from creation or last effective date. Some states impose longer requirements (California: 10 years for minor patients). Logs must be accessible for OCR inspection upon request.

Access controls for audit logs: read access must be restricted to authorized personnel (compliance, security, clinical informatics). Write access must be append-only -- no modification or deletion of existing audit records. Implement cryptographic integrity verification (hash chaining or signed records) to demonstrate tamper-evidence.


06. Breach Notification When LLMs Process PHI

HIPAA Breach Notification Rule (45 CFR Part 164, Subpart D) requires covered entities to notify affected individuals, HHS, and potentially media when unsecured PHI is breached. For LLM-powered applications, specific scenarios trigger breach notification obligations.

LLM-Specific Breach Scenarios

Scenarios Requiring Breach AssessmentBreach Notification Rule
!
PHI sent to an LLM vendor without a signed BAAThis is an impermissible disclosure by definition. A breach assessment is required. Unless you can demonstrate low probability of PHI compromise, notification is likely required. This is the most common LLM-related HIPAA incident.
Likely Reportable
!
LLM vendor data breach affecting PHI-containing promptsIf your LLM vendor experiences a security incident and you transmitted PHI to them (with or without a BAA), you must assess whether your PHI was involved and notify accordingly.
Assess Required
!
PHI echoed or leaked in LLM responses to wrong userIf the LLM returns PHI belonging to one patient in a session or response intended for another user -- this is a disclosure to an unauthorized person and requires breach assessment.
Likely Reportable
!
Prompt injection attack that exfiltrates PHIA successful prompt injection that causes the LLM to return PHI to an attacker or unauthorized system is an impermissible disclosure. Incident response and breach notification procedures apply.
Likely Reportable

Breach Notification Timeline

Once a breach is discovered:

  • Within 60 days: Notify affected individuals and HHS Secretary
  • If 500+ individuals in a state: Also notify prominent media outlets in that state
  • If less than 500 individuals: Can aggregate into annual report to HHS (but individual notification still within 60 days)
  • Business Associate breaches: BA must notify covered entity within 60 days of discovery -- your BAA should specify this timeline
!
"We use anonymized data" is not a safe harbor unless it meets HIPAA's de-identification standard.

Internal "anonymization" that does not satisfy HIPAA Safe Harbor (all 18 identifiers removed) or Expert Determination (qualified expert certifies re-identification risk is very small) does not exempt you from breach notification. Partial anonymization is not de-identification.


07. SentinelGate: HIPAA Technical Safeguards in One Layer

Building HIPAA-compliant LLM infrastructure from scratch means implementing PHI detection, minimum necessary enforcement, audit logging, access controls, and policy management -- for every LLM call, in real time, before inference. SentinelGate handles all of this in a single proxy layer that sits between your application and any LLM API.

+ How SentinelGate Covers HIPAA Technical Safeguards

One base URL change routes all LLM traffic through SentinelGate. Every call is scanned for PHI, evaluated against your configured policies, logged with the complete HIPAA audit schema, and governed in real time -- with zero changes to your application code.

+
Real-time PHI detection in prompts before inference -- 164.312(b)
+
Minimum necessary enforcement -- strip/redact non-essential PHI
+
Complete HIPAA audit log on every call -- tamper-evident, 6yr retention
+
BAA-status tracking per API key -- know which calls were covered
+
Output PHI scanning -- catch LLM responses that echo or derive PHI
+
Jailbreak and prompt injection detection -- prevent PHI exfiltration attacks
Start Free -- HIPAA-Compliant in 5 Minutes See Pricing

Integration: Change One Line

Add SentinelGate to any healthcare LLM application without touching your clinical code. Change the base URL -- every other configuration stays identical.

# Before: direct to LLM vendor (PHI unprotected)
OPENAI_API_BASE=https://api.openai.com/v1

# After: SentinelGate policy + audit layer (HIPAA safeguards applied)
OPENAI_API_BASE=https://gateway.sentinelgate.polsia.app/v1
OPENAI_API_KEY=your_sentinel_gate_key

# SentinelGate signs BAA for enterprise plans
# PHI detection, audit logging, minimum necessary enforcement -- automatic
# Python -- clinical note summarization with HIPAA safeguards
from openai import OpenAI

client = OpenAI(
    api_key="your_sentinel_gate_key",
    base_url="https://gateway.sentinelgate.polsia.app/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": "Summarize the following SOAP note: [note content]"
    }]
)
# PHI detection, policy enforcement, and audit log applied automatically
# No code changes. No missed calls. No coverage gaps.

Get your free API key -- HIPAA-compliant in 5 minutes

No credit card required. No application code changes. Route your healthcare LLM traffic through SentinelGate and have PHI protection and a complete audit trail before your next sprint ends.

Start Free -- HIPAA-Compliant in 5 Minutes See Audit Trail Guide →

08. Full HIPAA Compliance Checklist for LLM Healthcare Applications

Use this checklist to assess your current compliance posture. Every unchecked item is a gap between your current state and HIPAA compliance.

PHI Classification and Scope

Classify PHI risk for each LLM use caseDocument which LLM functions process PHI, what PHI categories are involved, and what risk level each use case carries.
Required -- Step 1
Audit RAG pipeline for PHI exposureIf using retrieval-augmented generation, document whether clinical databases are sources. Treat the full retrieval-to-inference pipeline as a PHI-processing system.
Required
Verify de-identification meets HIPAA Safe HarborIf claiming de-identified data exemption, confirm all 18 identifiers are removed. Internal anonymization does not qualify unless it meets HIPAA's standard.
Documentation

Business Associate Agreements

Sign BAA with every LLM API vendorOpenAI, Anthropic, Google, Cohere, or any LLM provider receiving PHI-containing prompts. Confirm BAA eligibility for your current pricing tier.
Required
Sign BAA with vector database and embedding service vendorsPinecone, Weaviate, Chroma, pgvector host/service, or any platform storing clinical embeddings derived from PHI.
Required
Sign BAA with cloud infrastructure providersAWS, GCP, Azure, Render, or any hosting platform where PHI workloads run. Major clouds offer standard BAAs -- execute before first PHI deployment.
Required
Sign BAA with logging and observability platformsAny platform receiving application logs that may contain PHI from LLM inputs or outputs. Review log content -- PHI in logs is PHI in scope.
Required

Minimum Necessary and PHI Minimization

Define minimum necessary PHI fields per LLM functionDocument which PHI fields are required for each clinical LLM function. Build allowlists, not blocklists -- start from nothing and add only what is needed.
Required
Implement pre-inference PHI detection and strippingApply PHI scanning and identifier substitution before every LLM API call. A policy proxy that intercepts calls is the most reliable implementation -- no coverage gaps from missed call sites.
Technical Safeguard
Scan LLM outputs for PHI before storage or displayLLM responses can contain PHI derived from or echoed from the prompt. Scan outputs and apply redaction before logging or passing to downstream systems.
Technical Safeguard

Technical Safeguards (Security Rule 164.312)

Implement unique user identification and access controlsEvery person and service with access to PHI-processing LLM systems has a unique identifier. Role-based access enforced. No shared API keys across users or environments.
Required
Implement automatic logoffSessions that can access PHI must terminate after a defined period of inactivity. Define and enforce the timeout period for clinical-facing LLM applications.
Required
Encrypt PHI in transit (TLS 1.2+)All LLM API calls carrying PHI use TLS 1.2 or higher. No exceptions for internal network calls. Certificate validation enforced -- do not disable SSL verification.
Required
Encrypt PHI at rest (AES-256)Databases, vector stores, and log archives containing PHI encrypted at rest. Document encryption method and key management procedures.
Addressable

Audit Logging (Security Rule 164.312(b))

Log every LLM call that processes or may process PHITimestamp, user/service identity, PHI categories detected, policy result, vendor/model, BAA status. Every call -- no sampling, no gaps.
Required
Implement tamper-evident audit log storageAppend-only audit records with cryptographic integrity verification. Audit logs cannot be modified or deleted -- only read by authorized personnel.
Required
Retain audit logs for minimum 6 yearsHIPAA requires 6-year minimum retention. State law may require longer. Archive logs with access controls and documented retention schedule.
Required

Breach Preparedness

Document breach assessment process for LLM incidentsWritten procedure for evaluating potential LLM-related breaches: unauthorized PHI disclosure to vendor, wrong-user response, prompt injection. Include 4-factor test (probability, PHI type, unauthorized person, mitigation).
Required
Establish 60-day notification timeline with BAA vendorsYour BAAs with LLM vendors must require them to notify you of breaches within 60 days of discovery. Verify this language exists in each executed BAA.
Required
!
Start with the BAA audit -- it is the fastest compliance gap to close.

Inventory every LLM vendor, vector database, and infrastructure provider your application uses. Check whether a BAA is signed and whether it covers your current tier and use case. Unsigned BAAs are the most common and most immediately actionable HIPAA gap in healthcare AI systems.

Continue Learning

Guide
How to Prevent PII Leaks in LLM APIs
Regex vs NER vs policy proxy -- with GDPR, HIPAA, SOC 2, and CCPA compliance mapping.
Guide
LLM Audit Trails for SOC 2
Full audit event schema, compliance report generation, and logging approaches.