repo-sentinel

Full security audit and enforcement for public repositories across 12 attack surfaces: git history, source code, docs, config, .gitignore recon, CI/CD, containers, dependencies, binaries, metadata, platform-specific (GitHub/GitLab), license compliance, and community surface. Provides fast-path and full 20-check audits, pre-commit hooks, CI gates, .gitignore generation, and history scrubbing. Use whenever pushing to a public remote, open-sourcing a repo, writing README/docs, configuring CI/CD or Dockerfiles, adding dependencies, or checking license compliance. Trigger on: push to GitHub, make repo public, open source this, set up the repo, write README, add CI/CD, create Dockerfile, set up pre-commit, add license, write SECURITY.md, secret leaks, credential rotation, .claude/ tracking, repo hygiene, security scanning, or is this safe to push, pre-oss, open source readiness, release audit, or open source audit. This is the gatekeeper between internal and public.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "repo-sentinel" with this command: npx skills add mathews-tom/praxis-skills/mathews-tom-praxis-skills-repo-sentinel

Repo Sentinel

Everything in a public repo is permanent attacker surface. This skill defines what belongs in a public repo, what does not, how to detect violations across 12 attack surfaces, how to remediate when the boundary is violated, and how to enforce continuously.

Reference files

This skill uses bundled reference files for detailed patterns and templates. Read them as needed:

FileWhen to read
references/scan-patterns.mdWhen running any audit (fast-path or full) — contains all detection commands
references/templates.mdWhen setting up enforcement, generating .gitignore, or creating CI gates
references/remediation.mdWhen fixing findings or scrubbing history — contains all fix procedures

Prerequisites

  • gh CLI installed and authenticated (gh auth status must pass) — required for GitHub-specific surface checks (Surface 10)
  • Active git repository context — the skill operates on git objects; non-git directories are out of scope
  • trufflehog or gitleaks — optional but strongly recommended for Surface 0 (git history) secret detection with entropy analysis; without them, fall back to git log -p grep patterns from references/scan-patterns.md
  • Read access to the full git object store — shallow clones (--depth N) will miss history secrets; warn the user if a shallow clone is detected

Calibration Rules

  • Public vs. private visibility: Apply stricter severity ratings for public repos — findings classified MEDIUM in a private repo (e.g., internal URL in a comment) escalate to HIGH in a public repo. Confirm repo visibility before scoring.
  • Stack-scoped surfaces: Scope the audit to attack surfaces relevant to the detected tech stack. A static HTML repo has no meaningful Surface 6 (containers) or Surface 7 (lock files) exposure — mark those surfaces N/A rather than penalizing.
  • N/A handling: Surfaces scored N/A are not penalized and do not lower the overall risk posture. Document N/A surfaces explicitly so the user understands what was skipped.
  • Tool availability: If trufflehog/gitleaks are unavailable, note this in the audit header and describe the reduced confidence in Surface 0 coverage.
  • False positive discipline: Flag a finding only when there is evidence of actual exposure, not just pattern proximity. A variable named api_key with a placeholder value is LOW, not CRITICAL.

Foundational Principle

The public/private boundary is a one-way valve. Once a byte reaches a public remote — via push, PR, issue, wiki, release asset, or GitHub Pages — assume it is indexed, cached, mirrored, and archived permanently. git push --force, PR deletion, issue edits, and release removal do NOT guarantee erasure. Scraping infrastructure (GitHub Archive, GH Torrent, Software Heritage, Google Cache, Wayback Machine, and dozens of proprietary security scanners) operates continuously with sub-hour latency.

Decision framework for every artifact:

QuestionIf YES →If NO →
Could this help an attacker who has no other access?EXCLUDEContinue
Does this reveal internal topology not inferable from public signals?EXCLUDEContinue
Does this contain values that grant access to anything?EXCLUDEContinue
Does this violate a license obligation or expose legal risk?EXCLUDEContinue
Would removing this reduce the repo's utility to legitimate users?INCLUDE (if above = all NO)EXCLUDE

When in doubt, exclude. False negatives (leaked secrets) are catastrophic and irreversible. False positives (over-redaction) are trivially correctable.


The 12 Attack Surfaces

Each surface defines what belongs, what doesn't, why it leaks, and how to detect it. Scan commands are in references/scan-patterns.md; remediation procedures in references/remediation.md.

Surface 0 — Git Object Store (History)

The most dangerous and most commonly missed surface. git grep only scans HEAD. An attacker with clone access gets the entire commit history. A file deleted in commit N remains in the object store forever unless explicitly scrubbed.

What leaks: Any secret, credential, internal URL, PII, or sensitive file that was ever committed — even if removed in a subsequent commit. Squash merges don't help; the original commits persist in reflog and may exist in forks.

Audit approach: Run history scans BEFORE working-tree scans. Use trufflehog or gitleaks for verified secret detection with entropy analysis. Fall back to git log -p grep if tools are unavailable. See references/scan-patterns.md § Surface 0.

Surface 1 — Source Code

Belongs: Application logic, algorithms, public API contracts, type definitions, tests with synthetic data, utility libraries, schema-only migrations.

Does NOT belong:

CategoryExamplesWhy
Hardcoded credentialsAPI_KEY = "sk-..."Direct access grant
Internal URLs/IPs10.0.x.x, *.internal, *.corpNetwork topology
Cloud resource IDsAWS account IDs, GCP project IDs, ARNs, S3 bucket namesResource targeting
PII / seed dataReal emails, names, phone numbers in fixturesPrivacy violation
Cryptographic materialPrivate keys, certs, JWTs, signing secretsAuth bypass
Business logic comments// HACK: bypass rate limit for enterpriseReveals security gaps
Licensing/billing logicEntitlement checks, license key validationRevenue loss
Debug/admin endpoints/admin/reset-all, /__debug/dump-statePrivileged access
Vendor workarounds// Workaround for Stripe API bug #4521Stack disclosure

Surface 2 — Documentation

Belongs: Setup instructions with placeholders, architecture overviews (external-appropriate abstraction), public API reference, contributing guidelines, license, feature-level changelog.

Does NOT belong: Internal URLs, private tracker references (JIRA-xxx, Linear ENG-xxx), team/individual names, deployment runbooks, unredacted postmortems, security architecture details, environment-specific configs.

CLAUDE.md and .claude/ — unconditional exclusion. Both contain comprehensive reconnaissance payloads. Always in .gitignore. No exceptions. No conditional logic.

Surface 3 — Configuration Files

Belongs: .env.example with placeholder values only, toolchain config (tsconfig, eslint, prettier), deployment configs with parameterized values, IaC with variable-only resource names.

Does NOT belong: .env and all .env.* (non-example), configs with embedded secrets, IaC with hardcoded identifiers, SSH config, cloud CLI config, editor config with paths, private registry references in .npmrc.

Surface 4 — .gitignore as Reconnaissance Vector

The .gitignore itself is a public file that leaks information.

Rules: Zero comments (comments are attacker documentation). Extension globs over filenames (*.credentials not oauth-credentials.json). No environment names in paths. No internal doc names. Directory patterns absorb children. Always verify with git ls-files -i --exclude-standard.

.claude/ and CLAUDE.md — always in .gitignore, unconditional.

Surface 5 — CI/CD Pipeline Definitions

Belongs: Workflow definitions, build/test commands, matrix strategies, caching configs.

Does NOT belong: Inline secrets, internal runner labels, private artifact registries, deployment target IPs/hostnames, hardcoded cloud identifiers. All secrets via platform secret store (${{ secrets.X }} for GitHub Actions).

Surface 6 — Container & IaC Definitions

Dockerfiles — safe: Public base images, build steps, EXPOSE ports, multi-stage patterns, non-secret ARG/ENV.

Dockerfiles — exclude: ARG/ENV with credentials, COPY of secret files, internal base images, infrastructure-revealing comments.

Docker Compose: All secrets via env_file or external secret management. Service names are public — don't reveal non-public capabilities. Volume mounts must not reference secret paths.

Terraform/IaC: All identifiers via variables with no real defaults. State files (*.tfstate) ALWAYS excluded. Variable files (*.tfvars) excluded with example templates.

Surface 7 — Dependencies & Lock Files

Often overlooked. Lock files and manifests leak internal infrastructure.

What leaks:

CategoryExamplesWhy
Private registry URLsregistry.internal.corp in lock filesInternal infra
Internal package names@corp-internal/auth-sdk in package.jsonOrg structure
Git+SSH dependenciesgit+ssh://...private-org/internal-lib.gitPrivate repo exposure
Pinned internal forksVersion pins revealing upstream vuln workaroundsPatch intelligence

Surface 8 — Binary & Large File Artifacts

What leaks:

CategoryExamplesWhy
Compiled binariesMay embed paths, credentials at compile timeCredential extraction
Database dumps.sql, .sqlite, .db with real dataData exposure
Jupyter notebook outputsAPI responses, tokens, internal URLs in cell outputCredential + topology
Image/PDF metadataEXIF data, PDF author fields, internal pathsAuthor/org enumeration
Archive files.zip, .tar.gz bundling secretsNested secret exposure

Surface 9 — Metadata & Git History

Commit messages: Don't reference what was vulnerable (Fix auth bypass in /admin/reset), only what changed. Don't paste error messages with credentials or internal stack traces.

PR descriptions / issue templates: Don't prompt users to paste credentials. PR templates should not reference internal processes. Bug reports: sanitized repro steps, not raw logs.

Branch names: Avoid names revealing unannounced features or internal codenames.

Release assets: Must not bundle config files, .env, or credentials.

Surface 10 — Platform-Specific Metadata (GitHub/GitLab)

ArtifactRiskMitigation
CODEOWNERSLeaks team structure and responsibility mappingUse team handles, not individuals
.github/FUNDING.ymlExposes financial platform accountsVerify intentional disclosure
GitHub Actions @main refsSupply chain attack vectorPin to full SHA, not tag
Workflow permissions: write-allOver-privilegeUse minimum required permissions
Wiki pagesSeparately cloneable, often contain sensitive runbooksAudit or disable
GitHub DiscussionsAccidental leak surfaceMonitor or disable
dependabot.ymlPrivate registry referencesParameterize registries
Repository topics/descriptionInternal project codenamesReview before public
GitHub Pages configReveals deployment targetsVerify intentional

Surface 11 — License & Legal Compliance

CheckRiskFix
Missing LICENSE fileDefaults to "all rights reserved"Add explicit license
License incompatibilityGPL dep in MIT projectAudit with license-checker/pip-licenses
Internal copyright headersReveals parent company/acquisitionGenericize or remove
Missing NOTICE fileRequired by Apache 2.0Generate from dependencies
CLA/DCO requirementsLegal risk for external contributionsAdd if accepting PRs
Third-party attributionLicense violationAudit dependency licenses

Dependency license audit commands:

# Node
npx license-checker --summary 2>/dev/null
# Python
pip-licenses 2>/dev/null
# Rust
cargo license 2>/dev/null

Flag GPL/AGPL contamination if the target license is permissive (MIT, BSD, Apache).

Private registry search patterns — grep lock files and configs:

Files: package-lock.json, poetry.lock, Cargo.lock, pip.conf, pyproject.toml, .npmrc, .yarnrc
Grep for: @company, internal-registry, private-pypi, artifactory, nexus, verdaccio

Copyright header check: If the license requires file-level headers (Apache 2.0: recommended; MIT: not required), verify presence in source files and genericize internal copyright notices that reveal parent company or acquisition history.

Surface 12 — Community Surface

Required for credible open-source projects accepting contributions:

ArtifactPurposeRisk if missing/wrong
SECURITY.mdResponsible disclosure policySignals immaturity to attackers
Issue templatesGuide reporters away from pasting secretsAccidental credential leaks
PR templatesWarn contributors about sensitive dataTopology leaks in diffs
CONTRIBUTING.mdSet expectations without revealing internalsInternal tooling exposure
Bot configs.github/stale.yml, ProbotInternal policy leakage

Severity Classification

All findings are classified by severity. The classification drives action priority:

SeverityCriteriaAction
CRITICALActive credential exposure, private key, auth tokenBlock push. Fix immediately.
HIGHInfrastructure/topology enabling targeted attackResolve before push.
MEDIUMInformation leakage aiding reconnaissanceFix in next commit.
LOWHygiene, style, redundancy issuesFix at convenience.

CRITICAL and HIGH in git history → full history scrub + credential rotation required.


Operations

Fast-Path Audit (Staged Changes Only)

Use when pushing a single file or small changeset. Scans only staged changes, not the full repo. Read references/scan-patterns.md § Fast-Path for the commands.

Full Repo Audit (20+ checks)

Run before making any repo public or before first push to a public remote. Read references/scan-patterns.md § Full Audit for the complete 20-check sequence.

Quick-Reference Scan Commands

The most critical inline checks. Full pattern set is in references/scan-patterns.md.

# 1. Secrets in code
git grep -rnE '(api[_-]?key|api[_-]?secret|access[_-]?token|auth[_-]?token|secret[_-]?key|private[_-]?key|password|passwd|credential)\s*[:=]\s*["\x27][^\s"'\'']{8,}' -- ':!*.lock' ':!node_modules' ':!vendor'

# 2. Internal URLs
git grep -rnE 'https?://[^\s)>"]*\.(internal|corp|local|intranet|private)' -- ':!*.lock'

# 3. Private IPs
git grep -rnE '(10\.\d+\.\d+\.\d+|172\.(1[6-9]|2\d|3[01])\.\d+\.\d+|192\.168\.\d+\.\d+)' -- ':!*.lock' ':!node_modules'

# 4. Cloud resource identifiers
git grep -rnE '(arn:aws:|projects/[a-z][\w-]+/locations|/subscriptions/[0-9a-f-]{36})' -- ':!*.lock'

# 5. Connection strings
git grep -rnE '(mongodb|postgres|mysql|redis|amqp|mssql)(\+\w+)?://[^${\s]+@' -- ':!*.lock'

# 6. .env files tracked
git ls-files | grep -iE '\.env(\.|$)' | grep -v '\.example$\|\.template$'

# 7. Credential files tracked
git ls-files | grep -iE '\.(pem|key|p12|pfx|keystore|jks|credentials)$'

# 8. .gitignore leakage
grep -n '^#\|secret\|credential\|oauth\|service.account\|password\|token' .gitignore 2>/dev/null

# 9. .claude/ tracked
git ls-files | grep '\.claude/'

# 10. Tracked files contradicting .gitignore
git ls-files -i --exclude-standard 2>/dev/null

# 11. Sensitive TODO/FIXME/HACK comments
git grep -rnE '(TODO|FIXME|HACK|XXX)\b.*\b(security|auth|bypass|vulnerability|exploit|hack|password|credential|secret|token|admin)' -- ':!*.lock'

# 12. CI/CD secrets inline
git grep -rnE '(password|token|key|secret)\s*[:=]\s*[^\s${\[]' -- '.github/workflows/' '.gitlab-ci.yml' 'Jenkinsfile' '.circleci/'

# 13. Internal URLs in docs
git grep -nE 'https?://[^\s)>]*\.(internal|corp|local|intranet|private)' -- '*.md' '*.rst' '*.txt' '*.adoc'

# 14. Private tracker references in docs
git grep -nE '(JIRA|LINEAR|ASANA|SHORTCUT|CLUBHOUSE|NOTION)-?\s*[A-Z]*-?\d+' -- '*.md' '*.rst' '*.txt'

# 15. Person names in docs
git grep -nE '(@[a-zA-Z][\w-]+|(ask|contact|ping|reach out to)\s+[A-Z][a-z]+)' -- '*.md' '*.rst' '*.txt'

# 16. CI hardcoded IPs
git grep -nE '\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b' -- '.github/workflows/*.yml' '.gitlab-ci.yml'

# 17. .env.example real values
grep -E '=' .env.example 2>/dev/null | grep -vE '=(your-|placeholder|changeme|xxx|example|TODO|REPLACE|""|\x27\x27|$)'

# 18. AWS account IDs
git grep -nE '\b\d{12}\b' -- '*.ts' '*.js' '*.py' '*.yaml' '*.yml' '*.json' '*.tf' | grep -iE '(account|arn|aws)'

Output format:

REPO SENTINEL AUDIT — <repo> — <date>

[CRITICAL — Direct credential exposure]
  src/config.ts:14 — API_KEY = "sk-live-..." → parameterize
  .env.production — tracked, contains real values → git rm --cached + history scrub

[HIGH — Infrastructure disclosure]
  docker-compose.yml:8 — redis://admin:pass@10.0.3.42:6379 → parameterize
  package-lock.json:892 — resolved: "https://registry.internal.corp/..." → remove internal dep

[MEDIUM — Information leakage]
  .gitignore:24 — oauth-credentials.json → replace with *.credentials.json
  README.md:45 — "See https://wiki.internal.corp/auth-design" → remove
  CODEOWNERS:3 — @john-smith → replace with @team-handle

[LOW — Hygiene]
  .gitignore:1-8 — verbose comment header → remove all comments
  LICENSE — missing → add appropriate license file

[TRACKED-BUT-IGNORED CONTRADICTIONS]
  .env.local — in .gitignore but tracked → git rm --cached

[MISSING FROM .gitignore]
  .claude/ — directory exists, not ignored
  *.sqlite — database files present, not ignored

[LICENSE COMPLIANCE]
  GPL-3.0 dependency in MIT-licensed project: package-x → evaluate compatibility

[ENFORCEMENT STATUS]
  Pre-commit hooks: NOT CONFIGURED → see references/templates.md
  CI secret scanning: NOT CONFIGURED → see references/templates.md
  GitHub secret scanning: UNKNOWN → enable in repo settings

Pre-Release Audit Mode (4-Stage DAG)

When preparing a repo for open-source release, run this 4-stage pre-release audit instead of the surface-based audit. Each stage emits PASS / WARN / FAIL with actionable remediation. Hard blockers in stages 1–3 halt the pipeline. Stage 4 produces advisory output.

Stage 1: Sensitive Assets        [HARD BLOCKER] → Surfaces 0–4, 8–9
Stage 2: Legal & Compliance      [HARD BLOCKER] → Surface 11
Stage 3: Public Surface Hygiene  [HARD BLOCKER] → Surfaces 4–7, 9–10
Stage 4: Contribution & Release  [SOFT BLOCKER] → Surface 12 + Pre-Release Checklist

Run stages sequentially. Report results in a structured audit table at the end.

Continuous Enforcement Setup

Shift-left prevention is the highest-leverage action. Read references/templates.md for ready-to-use pre-commit config, GitHub Actions workflow, and .gitignore generator.

Pre-Release Readiness Checklist

Run during Stage 4 of the Pre-Release Audit Mode, or standalone before any public release. All items are soft blockers — failures produce advisory output, not hard halts.

§4.1 Documentation Completeness

FileRequiredCheck
README.mdYESHas install + quickstart sections
CONTRIBUTING.mdYESFork/branch strategy, dev setup
CODE_OF_CONDUCT.mdYESAdopted standard (Contributor Covenant)
CHANGELOG.mdRECOMMENDEDKeep-a-changelog format
LICENSEYESVerified in Surface 11
SECURITY.mdRECOMMENDEDDisclosure process + contact
ARCHITECTURE.md or docs/RECOMMENDEDModule overview
.github/ISSUE_TEMPLATE/RECOMMENDEDBug + feature templates
.github/PULL_REQUEST_TEMPLATE.mdRECOMMENDEDPR checklist

§4.2 Code Quality Gates

  • Linter config: .eslintrc*, ruff.toml, pyproject.toml [tool.ruff], .clippy.toml
  • Formatter config: .prettierrc*, pyproject.toml [tool.black], rustfmt.toml
  • Pre-commit: .pre-commit-config.yaml
  • Type checking: tsconfig.json (strict), py.typed marker, mypy/pyright config

§4.3 Test Infrastructure

  • Test runner configured and documented
  • CI pipeline exists (.github/workflows/, .gitlab-ci.yml)
  • Test data is synthetic (not production-derived)
  • Smoke test or single-command verify path documented

§4.4 API Surface

  • Public API explicitly demarcated (__all__, exports, pub)
  • No internal implementation leaked across module boundaries
  • Configuration via env vars / config files, not hardcoded constants

§4.5 Package Metadata

Check manifest completeness across: package.json, pyproject.toml, Cargo.toml, *.csproj

Required fields: name, version, description, repository, homepage, keywords, author, license

§4.6 Reproducible Builds

  • Lock files committed
  • Toolchain versions documented: .tool-versions, .python-version, .nvmrc, rust-toolchain.toml
  • CI runner images pinned

§4.7 Binary Asset Policy

  • No files >1MB without Git LFS
  • No build artifacts committed
  • .gitattributes for LFS if needed

§4.8 Community Setup

  • Issue labels defined: good-first-issue, help-wanted, bug, enhancement
  • Discussions or external channel linked
  • Maintainer expectations documented

History Contamination Remediation

When secrets have already been committed. Read references/remediation.md for the full triage decision tree, git filter-repo commands, and post-scrub protocol.

Quick-Reference Remediation

Triage decision table:

Pushed to public remote?Contains real credentials?Action
NoAnygit rm --cached + fix .gitignore
YesNo (placeholder)git rm --cached + fix .gitignore. Scrub optional.
YesYesFull history scrub + credential rotation. Assume compromise.

git filter-repo (preferred):

cp -r .git .git-backup

# By path
git filter-repo --invert-paths --path <file> --force

# By glob
git filter-repo --invert-paths --path-glob '*.pem' --force

# By regex
git filter-repo --invert-paths --path-regex '.*secret.*' --force

# Re-add remote (filter-repo strips it)
git remote add origin <url>
git push --force --all && git push --force --tags

BFG Repo-Cleaner (fallback):

java -jar bfg.jar --delete-files <filename> .git
git reflog expire --expire=now --all && git gc --prune=now --aggressive

Post-scrub protocol (non-optional):

  1. Rotate every exposed credential — scrubbing does not un-expose. GitHub caches objects ~90 days. Mirrors and forks retain indefinitely.
  2. Verify: git log --all --full-history -- <path> must return empty.
  3. Update all ignore/exclude rules before next commit.
  4. For severe exposure: consider repo deletion + recreation. Contact GitHub support for cache invalidation.
  5. Rotate CI/CD secrets independently — pipeline stores are unaffected by git history operations.
  6. Document incident internally: what was exposed, how long, which remotes, what was rotated.

.gitignore Generation

Generate a complete, opinionated .gitignore tailored to detected project type with all hygiene rules baked in. Read references/templates.md § .gitignore Generator.


Limitations

  • History scrubbing does not guarantee removal of exposure. Force-push is required, and external mirrors (forks, GitHub Archive, Software Heritage) retain history indefinitely regardless of local operations.
  • External mirrors, caches, and search engine indexes cannot be verified as de-indexed after content removal.
  • Single-repo scope only — not designed for monorepo audits without adaptation. Cross-package secret propagation requires separate analysis per package root.
  • GitHub-specific checks (branch protection, secret scanning alerts, security advisories) require the gh CLI with authenticated access. Without it, Surface 10 coverage is reduced.
  • Secret scanning depth depends on available tooling. trufflehog and gitleaks provide verified detection with entropy analysis; manual regex patterns used as fallback have higher false-positive rates and miss obfuscated credentials.
  • Artifact decisions for package registry publishing (npm, PyPI, crates) have ecosystem-specific norms that differ from source repo inclusion rules — apply ecosystem conventions when auditing published artifacts.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

dependency-audit

No summary provided by upstream source.

Repository SourceNeeds Review
Security

rag-auditor

No summary provided by upstream source.

Repository SourceNeeds Review
General

manuscript-review

No summary provided by upstream source.

Repository SourceNeeds Review
General

html-presentation

No summary provided by upstream source.

Repository SourceNeeds Review
repo-sentinel | V50.AI