#OpenAI releases #PrivacyFilter β an open-weight #AI model for detecting & redacting #PII in text.
-
#OpenAI releases #PrivacyFilter β an open-weight #AI model for detecting & redacting #PII in text. Runs fully locally, no data ever leaves your machine. Apache 2.0 licensed. #opensource
π§΅
#privacy
Detects 8 PII categories in a single forward pass: names, email addresses, phone numbers, physical addresses, URLs, dates, account numbers & secrets (passwords, API keys) β covering virtually all common sensitive data types
-
#OpenAI releases #PrivacyFilter β an open-weight #AI model for detecting & redacting #PII in text. Runs fully locally, no data ever leaves your machine. Apache 2.0 licensed. #opensource
π§΅
#privacy
Detects 8 PII categories in a single forward pass: names, email addresses, phone numbers, physical addresses, URLs, dates, account numbers & secrets (passwords, API keys) β covering virtually all common sensitive data types
π§ Bidirectional token-classification β unlike autoregressive LLMs, #PrivacyFilter reads input from both directions simultaneously for deeper context awareness, catching subtle #PII that simple pattern-matching or RegEx rules miss
1.5B parameter model with only ~50M active parameters (#MoE) β lightweight enough to run on a standard laptop or in a browser, yet achieves ~96β97% F1 score on standard #PII benchmarks #MachineLearning #AI -
π§ Bidirectional token-classification β unlike autoregressive LLMs, #PrivacyFilter reads input from both directions simultaneously for deeper context awareness, catching subtle #PII that simple pattern-matching or RegEx rules miss
1.5B parameter model with only ~50M active parameters (#MoE) β lightweight enough to run on a standard laptop or in a browser, yet achieves ~96β97% F1 score on standard #PII benchmarks #MachineLearning #AI
128,000-token context window β processes entire legal documents, long email threads or large codebases in a single pass. No need to chunk text before filtering. #privacy #DataEngineering
οΈ Built for high-throughput workflows: CLI tool (opf), GPU & CPU support, interactive mode, structured JSON output with ANSI color-coded previews. Runs on-premises β data never sent to external servers #DevOps -
128,000-token context window β processes entire legal documents, long email threads or large codebases in a single pass. No need to chunk text before filtering. #privacy #DataEngineering
οΈ Built for high-throughput workflows: CLI tool (opf), GPU & CPU support, interactive mode, structured JSON output with ANSI color-coded previews. Runs on-premises β data never sent to external servers #DevOps
Fine-tunable on domain-specific data β adapts to medical, legal or enterprise environments where generic rules fail. Based on the open #gptoss model family. Available on #HuggingFace under Apache 2.0
Caveat: #PrivacyFilter is a redaction & data minimization aid β NOT a compliance guarantee. It should be one layer in a holistic #privacybydesign approach. Always combine with human review for high-stakes use cases
https://openai.com/index/introducing-openai-privacy-filter/ -
#OpenAI releases #PrivacyFilter β an open-weight #AI model for detecting & redacting #PII in text. Runs fully locally, no data ever leaves your machine. Apache 2.0 licensed. #opensource
π§΅
#privacy
Detects 8 PII categories in a single forward pass: names, email addresses, phone numbers, physical addresses, URLs, dates, account numbers & secrets (passwords, API keys) β covering virtually all common sensitive data types
@michabbb Finally, a use for local LLMs.
-
System shared this topic