LLM PII detection and redaction isn’t about finding every piece of sensitive data; it’s about finding the most damaging pieces just before they leave the system.

Let’s see it in action. Imagine a user asks our LLM: "Can you book me a flight from London Heathrow to JFK for next Tuesday, May 14th? My booking reference is XYZ789. My email is john.doe@example.com."

Here’s how we’d intercept and redact that:

{
  "original_input": "Can you book me a flight from London Heathrow to JFK for next Tuesday, May 14th? My booking reference is XYZ789. My email is john.doe@example.com.",
  "redacted_input": "Can you book me a flight from London Heathrow to JFK for next Tuesday, [DATE]. My booking reference is [REFERENCE]. My email is [EMAIL].",
  "redacted_output": "I can help with that! To confirm, you're looking for a flight from London Heathrow to JFK for [DATE]. What is the exact date you'd like to travel? Please also provide a contact email address for the booking confirmation."
}

Notice how the LLM’s output is also redacted, even though the original prompt didn’t ask for a response containing PII. This is crucial: the LLM might infer or hallucinate sensitive details.

The core problem this solves is preventing data leakage. LLMs are trained on vast, often public datasets. When they process user inputs, they can inadvertently echo, infer, or even generate PII that wasn’t explicitly provided but is linked to the context. This could be an employee’s internal ID, a customer’s specific order number, or even a medical condition mentioned in passing. Redaction acts as a gatekeeper, stripping out this sensitive information before it’s stored, logged, or passed to downstream systems that shouldn’t see it.

Internally, this typically involves a multi-stage process. First, a Named Entity Recognition (NER) model, often a specialized fine-tuned transformer, scans the text. It’s trained to identify categories like PERSON, EMAIL, PHONE_NUMBER, CREDIT_CARD, DATE, LOCATION, and custom entities like BOOKING_REFERENCE or USER_ID. Once identified, these entities are mapped to a predefined set of redaction patterns. For instance, any entity tagged as EMAIL is replaced with [EMAIL], PHONE_NUMBER with [PHONE_NUMBER], and so on. The exact patterns can be configured per system or per data sensitivity level.

The levers you control are primarily the configuration of the NER model and the redaction mapping. You can fine-tune the NER model to recognize new entity types relevant to your domain. For example, if your application deals with healthcare, you might train it to identify PATIENT_ID or MEDICAL_RECORD_NUMBER. You also define the replacement tokens – whether you want [EMAIL] or something more generic like [SENSITIVE_DATA]. Crucially, you can also define confidence thresholds for the NER model. A higher threshold means only entities detected with very high certainty will be redacted, reducing false positives but increasing the risk of missing some PII. Conversely, a lower threshold catches more but might redact non-PII.

The most surprising part for many is that the LLM itself can be a tool for detecting PII, not just a source of it. By prompting an LLM with a specific instruction like "Identify any Personally Identifiable Information in the following text and list it by category," you can leverage its understanding of context and common PII formats. This can be used as a pre-processing step or a secondary validation layer, often catching entities that a rule-based or standard NER model might miss due to unusual formatting or context-dependent meaning.

The next logical step after implementing basic redaction is to handle context-aware PII. This involves understanding that "May 14th" is only sensitive when linked to a specific event or individual, and not when discussing general calendar dates.

Want structured learning?

Take the full AI Security course →