Sanitize Git Repository
This skill provides guidance for systematically identifying and replacing sensitive information in git repositories, including API keys, tokens, passwords, and other credentials.
When to Use This Skill
-
Sanitizing a repository before sharing or open-sourcing
-
Removing accidentally committed secrets from a codebase
-
Replacing hardcoded credentials with placeholders
-
Auditing a repository for sensitive information
-
Preparing code for security review
Recommended Approach
Phase 1: Comprehensive Discovery
Build a complete inventory of all sensitive values before making any changes. This prevents the common mistake of discovering additional secrets mid-process.
Common Secret Patterns to Search:
Type Pattern/Prefix Example
AWS Access Keys AKIA[A-Z0-9]{16}
AKIAIOSFODNN7EXAMPLE
AWS Secret Keys 40-char base64 strings Often near access keys
GitHub Tokens ghp_ , gho_ , ghs_ , ghr_
ghp_xxxxxxxxxxxx
Huggingface Tokens hf_
hf_xxxxxxxxxxxx
Generic API Keys api[_-]?key , apikey
Varies
Bearer Tokens bearer , token
Varies
Private Keys -----BEGIN.*PRIVATE KEY-----
RSA/EC keys
Database URLs postgres:// , mysql:// , mongodb://
Connection strings
Password Fields password , passwd , pwd
Varies
Discovery Strategy:
-
Search for known prefixes and patterns using regex
-
Search for common variable names (API_KEY , SECRET , TOKEN , PASSWORD , CREDENTIAL )
-
Check configuration files (.env , config.* , settings.* , *.json , *.yaml , *.yml )
-
Examine test fixtures and mock data files
-
Check embedded data within other formats (JSON encoded in strings, base64 encoded content)
Important Locations to Check:
-
Configuration files and environment templates
-
Test fixtures and sample data
-
Documentation and README files
-
CI/CD configuration files
-
Docker and deployment configurations
-
JSON/YAML data files (secrets may be embedded in structured data)
-
Git diff outputs or patch files stored in the repository
Phase 2: Inventory Documentation
Before making changes, create a documented list of:
-
Each unique sensitive value found
-
All file locations where each value appears
-
The exact string to match (including surrounding context if needed for uniqueness)
-
The placeholder to use for replacement
Placeholder Conventions:
-
Use descriptive placeholders that indicate the type: <AWS_ACCESS_KEY> , <GITHUB_TOKEN> , <DATABASE_PASSWORD>
-
Maintain consistency across all replacements
-
Match the format specified by the user if provided
Phase 3: Systematic Replacement
Execute replacements methodically:
-
Work through the inventory one secret at a time
-
For each secret, replace ALL occurrences across ALL files
-
Verify each replacement immediately after making it
-
Use exact string matching to avoid unintended modifications
Replacement Best Practices:
-
Read files before editing to ensure exact string matching
-
Handle whitespace and formatting precisely
-
Consider secrets embedded in JSON or other structured formats (may require escaping)
-
Use batch replacements when the same value appears multiple times in one file
Phase 4: Verification
After all replacements:
-
Re-run all original discovery searches to confirm no secrets remain
-
Search for partial matches of sensitive values
-
Run any provided test suites to validate the sanitization
-
Check that placeholders are properly formatted
Common Pitfalls
- Incomplete Initial Discovery
Problem: Secrets discovered incrementally during the replacement process, requiring backtracking.
Prevention: Invest time upfront in comprehensive searching using multiple patterns and checking all file types, including data files and embedded content.
- Secrets in Unexpected Locations
Problem: Secrets appear in JSON files, embedded strings, test fixtures, or encoded data that aren't found by simple searches.
Prevention: Search recursively through all file types. Check JSON and YAML files specifically. Look for base64-encoded content that might contain secrets.
- Exact String Matching Failures
Problem: Edit operations fail due to whitespace differences or character encoding issues.
Prevention: Always read the target file first to understand exact formatting. Copy strings exactly as they appear in the file, including whitespace.
- Missing Occurrences
Problem: Multiple occurrences of the same secret exist, but only some are replaced.
Prevention: After each replacement, immediately verify by searching for the original value again. Use replace-all functionality when available.
- Git History Considerations
Problem: Secrets remain in git history even after being removed from current files.
Prevention: Clarify with the user whether git history sanitization is required. If so, tools like git filter-branch or BFG Repo-Cleaner may be needed (this is a separate, more complex operation).
- Tool Availability Assumptions
Problem: Specialized search tools (like ripgrep) may not be available in all environments.
Prevention: Have fallback approaches ready. Standard grep -r works in most environments. Test tool availability before relying on it.
Verification Checklist
Before considering the task complete:
-
All known secret patterns have been searched for
-
Configuration files have been examined
-
JSON/YAML data files have been checked
-
Test fixtures and mock data have been reviewed
-
Each identified secret has been replaced in all locations
-
Verification searches confirm no original secrets remain
-
Placeholders follow a consistent format
-
Any provided tests pass successfully
Search Commands Reference
Standard grep (widely available):
Search for AWS access keys
grep -rn "AKIA[A-Z0-9]{16}" . --exclude-dir=.git
Search for common token prefixes
grep -rn "ghp_|gho_|ghs_|hf_" . --exclude-dir=.git
Search for password-related strings
grep -rn -i "password|passwd|pwd" . --exclude-dir=.git
Search for API key patterns
grep -rn -i "api[_-]?key|apikey" . --exclude-dir=.git
Always exclude the .git directory from searches to avoid noise from git internals.