OpenAI has released Privacy Filter, an open-weight model for detecting and redacting personally identifiable information (PII) in text. The 1.5B parameter model (with 50M active parameters) uses a bidirectional token-classification architecture with span decoding, supporting up to 128K tokens of context. It detects eight PII categories including names, addresses, emails, phone numbers, dates, account numbers, and secrets like API keys. The model achieves a 97.43% F1 score on the PII-Masking-300k benchmark and can run locally to keep sensitive data on-device. Released under Apache 2.0 on Hugging Face and GitHub, it supports fine-tuning for domain-specific use cases. OpenAI notes it is not a compliance tool and recommends human review in high-stakes settings like legal, medical, and financial workflows.
Table of contents
A small model with frontier personal data detection capabilityModel overviewExample input textText after masking personal identifiersHow we built itHow Privacy Filter performsLimitationsAvailabilityLooking aheadSort: