#OpenAI releases #PrivacyFilter — an open-weight #AI model for detecting & redacting #PII in text.
-
#OpenAI releases #PrivacyFilter — an open-weight #AI model for detecting & redacting #PII in text. Runs fully locally, no data ever leaves your machine. Apache 2.0 licensed. #opensource
Detects 8 PII categories in a single forward pass: names, email addresses, phone numbers, physical addresses, URLs, dates, account numbers & secrets (passwords, API keys) — covering virtually all common sensitive data types
-
#OpenAI releases #PrivacyFilter — an open-weight #AI model for detecting & redacting #PII in text. Runs fully locally, no data ever leaves your machine. Apache 2.0 licensed. #opensource
Detects 8 PII categories in a single forward pass: names, email addresses, phone numbers, physical addresses, URLs, dates, account numbers & secrets (passwords, API keys) — covering virtually all common sensitive data types
🧠 Bidirectional token-classification — unlike autoregressive LLMs, #PrivacyFilter reads input from both directions simultaneously for deeper context awareness, catching subtle #PII that simple pattern-matching or RegEx rules miss
1.5B parameter model with only ~50M active parameters (#MoE) — lightweight enough to run on a standard laptop or in a browser, yet achieves ~96–97% F1 score on standard #PII benchmarks #MachineLearning #AI -
🧠 Bidirectional token-classification — unlike autoregressive LLMs, #PrivacyFilter reads input from both directions simultaneously for deeper context awareness, catching subtle #PII that simple pattern-matching or RegEx rules miss
1.5B parameter model with only ~50M active parameters (#MoE) — lightweight enough to run on a standard laptop or in a browser, yet achieves ~96–97% F1 score on standard #PII benchmarks #MachineLearning #AI
128,000-token context window — processes entire legal documents, long email threads or large codebases in a single pass. No need to chunk text before filtering. #privacy #DataEngineering
️ Built for high-throughput workflows: CLI tool (opf), GPU & CPU support, interactive mode, structured JSON output with ANSI color-coded previews. Runs on-premises — data never sent to external servers #DevOps -
128,000-token context window — processes entire legal documents, long email threads or large codebases in a single pass. No need to chunk text before filtering. #privacy #DataEngineering
️ Built for high-throughput workflows: CLI tool (opf), GPU & CPU support, interactive mode, structured JSON output with ANSI color-coded previews. Runs on-premises — data never sent to external servers #DevOps
Fine-tunable on domain-specific data — adapts to medical, legal or enterprise environments where generic rules fail. Based on the open #gptoss model family. Available on #HuggingFace under Apache 2.0
Caveat: #PrivacyFilter is a redaction & data minimization aid — NOT a compliance guarantee. It should be one layer in a holistic #privacybydesign approach. Always combine with human review for high-stakes use cases
https://openai.com/index/introducing-openai-privacy-filter/ -
#OpenAI releases #PrivacyFilter — an open-weight #AI model for detecting & redacting #PII in text. Runs fully locally, no data ever leaves your machine. Apache 2.0 licensed. #opensource
Detects 8 PII categories in a single forward pass: names, email addresses, phone numbers, physical addresses, URLs, dates, account numbers & secrets (passwords, API keys) — covering virtually all common sensitive data types
@michabbb Finally, a use for local LLMs.
-
System shared this topic
