Introducing OpenAI Privacy Filter

OpenAI has released Privacy Filter, an open-weight model for detecting and redacting personally identifiable information (PII) in text. The 1.5B parameter model (with 50M active parameters) uses a bidirectional token-classification architecture with span decoding, supporting up to 128K tokens of context. It detects eight PII categories including names, addresses, emails, phone numbers, dates, account numbers, and secrets like API keys. The model achieves a 97.43% F1 score on the PII-Masking-300k benchmark and can run locally to keep sensitive data on-device. Released under Apache 2.0 on Hugging Face and GitHub, it supports fine-tuning for domain-specific use cases. OpenAI notes it is not a compliance tool and recommends human review in high-stakes settings like legal, medical, and financial workflows.

#machine-learning

#open-source

#privacy

#openai

Apr 26•8m read time•From openai.com

Table of contents

A small model with frontier personal data detection capability Model overview Example input text Text after masking personal identifiers How we built it How Privacy Filter performs Limitations Availability Looking ahead

Comment

Bookmark

Copy

Sort: