sanitize

Detect and redact PII from text files. Supports 15 categories including credit cards, SSNs, emails, API keys, addresses, and more — with zero dependencies.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "sanitize" with this command: npx skills add agentward-ai/sanitize

AgentWard Sanitize

Detect and redact personally identifiable information (PII) from text files.

IMPORTANT — PII Safety Rules

  • Do NOT read the input file directly. It may contain sensitive PII.
  • ALWAYS use --output FILE to write sanitized output to a file.
  • Only read the OUTPUT file, never the raw input.
  • Only show the user the redacted output, never the raw input.
  • --json and --preview are safe — they do NOT print raw PII values to stdout.
  • The entity map (raw PII → placeholder mapping) is written to a separate sidecar file (*.entity-map.json) only when --output is used. Do NOT read the entity map file.

What it does

Scans files for PII — credit cards, SSNs, emails, phone numbers, API keys, IP addresses, mailing addresses, dates of birth, passport numbers, driver's license numbers, bank routing numbers, medical license numbers, and insurance member IDs — and replaces each instance with a numbered placeholder like [CREDIT_CARD_1].

Usage

Sanitize a file (RECOMMENDED — always use --output)

python scripts/sanitize.py patient-notes.txt --output clean.txt

Preview mode (detect PII categories/offsets without showing raw values)

python scripts/sanitize.py notes.md --preview

JSON output (safe — no raw PII in stdout)

python scripts/sanitize.py report.txt --json --output clean.txt

Filter to specific categories

python scripts/sanitize.py log.txt --categories ssn,credit_card,email --output clean.txt

Supported PII categories

See references/SUPPORTED_PII.md for the full list with detection methods and false positive mitigation.

CategoryPattern typeExample
credit_cardLuhn-validated 13-19 digits4111 1111 1111 1111
ssn3-2-4 digit groups123-45-6789
cvvKeyword-anchored 3-4 digitsCVV: 123
expiry_dateKeyword-anchored MM/YYexpiry 01/30
api_keyProvider prefix patternssk-abc..., ghp_..., AKIA...
emailStandard email formatuser@example.com
phoneUS/intl phone numbers+1 (555) 123-4567
ip_addressIPv4 addresses192.168.1.100
date_of_birthKeyword-anchored datesDOB: 03/15/1985
passportKeyword-anchored alphanumericPassport: AB1234567
drivers_licenseKeyword-anchored alphanumericDL: D12345678
bank_routingKeyword-anchored 9 digitsrouting: 021000021
addressStreet + city/state/zip742 Evergreen Terrace Dr, Springfield, IL 62704
medical_licenseKeyword-anchored license IDLicense: CA-MD-8827341
insurance_idKeyword-anchored member/policy IDMember ID: BCB-2847193

Security and Privacy

  • All processing is local. The script makes zero network calls. No data leaves your machine.
  • Zero dependencies. Uses only Python standard library — no third-party packages to audit.
  • PII never reaches stdout. The --json and --preview modes strip raw PII values from output. The entity map (containing raw PII to placeholder mappings) is only written to a sidecar file on disk when --output is used.
  • Designed for agent safety. The skill instructions above tell the agent to never read the raw input file or the entity map file — only the sanitized output.

Requirements

  • Python 3.11+
  • No external dependencies (stdlib only)

About

Built by AgentWard — the open-source permission control plane for AI agents.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Remotion Word Highlight Subtitles

Add word-level highlighted subtitles to local short videos using Whisper word timestamps and Remotion rendering.

Registry SourceRecently Updated
General

Wechat Db Decrypt

Decrypt and extract messages from WeChat PC 3.x/4.x databases after obtaining decrypted SQLite files from xwechat_files directory.

Registry SourceRecently Updated
General

AI生图提示词优化

AI 图片提示词优化、高保真参考图融合与 1:1 还原技能。当用户说“ao”、要求根据上传图片生成更精准提示词、 严格复刻模特/产品/场景、完全参考上传模特风格、指定产品位置与角度、生成电商主图/详情页/九宫格/四视图/模特图、 需要轻奢/法式/运动/极简/买手店风品牌语气、优化 Flux / Nonbana /...

Registry SourceRecently Updated
General

High-Precision 3D Web Optimize

Optimize high-precision .glb/.gltf models for Web 3D and digital twin delivery. Use when preparing Three.js or Babylon.js assets that need UV-safe simplifica...

Registry SourceRecently Updated