OpenAI’s new Privacy Filter runs on your laptop so PII never hits the cloud

OpenAI has released Privacy Filter, a bidirectional token-classification model for detecting and redacting PII in text. It runs locally on a browser or laptop (1.5B total parameters, 50M active), supports up to 128K tokens in a single pass, and achieves a 96% F1 score on the PII-Masking-300k benchmark. It covers eight PII categories including names, addresses, emails, phone numbers, URLs, dates, account numbers, and secrets. Unlike regex or NLP rule-based tools, it uses context-aware language understanding to handle nuanced cases like distinguishing public business addresses from private home addresses. Available on Hugging Face and GitHub under Apache 2.0, it can be fine-tuned with as little as 10% of a dataset. OpenAI positions it as one component in a broader privacy-by-design system, not a full anonymization solution, and recommends human review for high-sensitivity domains.

#llm

#nlp

#privacy

#openai

Apr 23•5m read time•From thenewstack.io

Table of contents

Scanning text in a single pass for emails, numbers, and more Greater context-awareness, run locally How it compares to the competition What this means for developers One more piece in OpenAI’s stack

Comment

Bookmark

Copy

Sort: