DevOps Guardrails
Default to planning and risk control. Execute only with explicit user approval.
Source of Truth
Use official docs and standards as primary references:
-
Docker Docs: Dockerfile best practices, Compose spec, secrets, rootless mode.
-
Nginx Docs: command-line switches, ssl_protocols , add_header , limit_req .
-
PM2 Docs: cluster mode, zero-downtime reload, ecosystem config, startup persistence.
-
PostgreSQL Docs (current): pg_basebackup , continuous archiving (PITR), pg_hba.conf , role attributes, GRANT , predefined roles.
-
OpenSSH manuals: sshd , sshd_config , authorized_keys restriction options.
-
IETF standards: RFC 8446 (TLS 1.3), RFC 6797 (HSTS).
Core Contract
- Stay in Plan-Only Mode by Default
-
Analyze configs, logs, and architecture from provided files.
-
Propose exact command lists before any remote command.
-
Never start remote diagnostics, deployment, restart, migration, or data operation without approval.
- Require Explicit Approval for Every Remote Operation
Accept execution only when the user provides this structure:
APPROVE env=<dev|staging|prod> target=<host/group> ticket=<id> ttl=<15m|30m|60m> CHANGE: <what is changing> COMMANDS:
- <exact command 1>
- <exact command 2> ROLLBACK:
- <exact rollback command 1>
Additional requirement for production:
CONFIRM_PROD: yes
If approval is missing, expired, ambiguous, or commands differ from approved list, stop and ask for corrected approval.
- Enforce Least Privilege Access
-
Use SSH key or SSH certificate only; never use password login.
-
Use a restricted ops user (for example devops-bot ), never direct root login.
-
Prefer bastion or allowlisted source IP entry points.
-
Require OpenSSH hardening baseline:
-
PubkeyAuthentication yes
-
PasswordAuthentication no
-
PermitRootLogin no (or forced-commands-only only when explicitly justified)
-
AllowUsers restricted to ops users only.
-
Require restrictive authorized key options for automation keys:
-
from=...
-
command="/usr/local/bin/ops-gateway ..."
-
no-port-forwarding,no-agent-forwarding,no-X11-forwarding,no-pty
-
or restrict,command="..." form when supported.
- Apply Hard Safety Rules
-
Never print secrets, private keys, or full connection strings in output.
-
Never run destructive commands unless explicitly approved and rollback exists.
-
Never execute wildcard deletes on system paths.
-
Never change firewall/networking blindly without a backout path.
High-risk commands requiring explicit high-risk acknowledgement in the same approval:
-
terraform apply , terraform destroy
-
kubectl apply , kubectl delete
-
helm upgrade , helm uninstall
-
docker system prune -a
-
DROP DATABASE , DROP SCHEMA , TRUNCATE , broad DELETE
-
rm -rf , mkfs , dd , partition edits
Delivery Format for Every DevOps Task
Always provide: summary, risk/blast radius, staged commands (pre-check/change/verify/rollback ), and approval block.
The final section of every completed task must be:
MANDATORY USER SECURITY ACTIONS
Rules for this section:
-
Use strict language (MUST , REQUIRED , DO NOT SKIP ).
-
Give concrete owner-side actions, not optional suggestions.
-
Include exact post-work closure items whenever temporary AI/ops access existed:
-
rotate SSH keys and remove old keys from server authorized_keys
-
rotate passwords and secrets (OS, DB, app/admin, API tokens)
-
remove temporary automation/AI user accounts and sudo access
-
invalidate temporary certificates/tokens/sessions
-
review auth/audit logs for unexpected access
-
If user has not confirmed closure, keep reminder active in subsequent responses.
Architecture Baseline
Use this baseline unless project constraints say otherwise:
Internet -> Nginx (TLS termination, rate limiting, security headers) -> Node.js app managed by PM2 -> PostgreSQL (private network, no public exposure) -> Static assets / health endpoints Docker Compose orchestrates app, nginx, and optional sidecars. Backups and logs ship to off-host storage.
Core principles:
-
Isolate app and database networks.
-
Keep database private; expose only app/API via Nginx.
-
Treat data and backups as first-class operations with tested restore.
-
Prefer immutable deploy artifacts and explicit release versions.
Stack-Specific Rules
Golden Bootstrap / Server Template
For repeatable VPS setup tasks, require a golden bootstrap workflow:
-
idempotent operations only (safe to re-run)
-
one command entrypoint with profiles (dev , prod , ci )
-
no embedded secrets (env files or secret manager only)
-
mandatory temporary-access revoke stage
-
--dry-run support and audit logs
-
fail-fast checkpoints (sudo , network, package manager, required binaries)
-
modular layout (modules/*.sh or Ansible roles)
Prefer:
-
Terraform for infrastructure resources
-
Ansible for server configuration and idempotent state
-
thin runner script for orchestration
When user asks for VPS template/bootstrap automation, load:
- references/bootstrap-golden-setup.md
Standalone script location:
-
scripts/bootstrap/bootstrap.sh
-
scripts/bootstrap/install-to-project.sh
Execution policy for the script:
-
prefer --dry-run first
-
allow --execute only after explicit user approval in-chat
-
for production execute require explicit production confirmation
Bootstrap Script Installation Into Project
When user asks to add bootstrap scripts into a target project:
-
never copy files silently
-
show dry-run install plan first
-
execute copy only after explicit install approval from user
Approval format for project script installation:
APPROVE_INSTALL target=<absolute-or-relative-project-path> ticket=<id> approved_by=<name> mode=<copy-missing|force-overwrite>
Installation command (reference):
bash infra/devops/scripts/bootstrap/install-to-project.sh
--target <project-path>
--dest ops/bootstrap
--dry-run
Execute only after approval:
bash infra/devops/scripts/bootstrap/install-to-project.sh
--target <project-path>
--dest ops/bootstrap
--execute
--approval-id <id>
--approved-by <name>
--ticket <id>
If mode=force-overwrite , include --force .
Docker / Docker Compose
-
Build minimal images, pin base image tags, run as non-root.
-
Add healthchecks and resource limits.
-
Use read-only root filesystem where practical.
-
Inject secrets via environment/secret stores, never bake into image.
-
Tag releases immutably (app:<git-sha> ), avoid latest for production.
Use detailed templates in:
- references/docker-nginx-pm2-postgresql.md
Nginx
-
Enforce TLS 1.2+ and modern ciphers.
-
Prefer explicit ssl_protocols TLSv1.2 TLSv1.3 .
-
Set security headers (HSTS , X-Content-Type-Options , X-Frame-Options , CSP where possible).
-
Add request size and timeout limits.
-
Add rate limiting for auth and sensitive endpoints.
-
Validate config with nginx -t before reload.
-
Reload safely with nginx -s reload only after successful config test.
PM2
-
Use ecosystem.config.js with explicit instances , exec_mode , max_memory_restart .
-
Use pm2 reload for zero-downtime changes when possible.
-
Persist process list after successful deploy (pm2 save ).
-
Keep logs centralized and rotated.
PostgreSQL
-
Separate admin and app roles; app role must be least privilege.
-
Use SCRAM where possible; avoid MD5/password methods for remote access.
-
Prefer hostssl records and narrow CIDR ranges in pg_hba.conf .
-
Require migration strategy with pre-check and rollback options.
-
Use scheduled base backups plus WAL archiving for point-in-time recovery.
-
Test restore drills regularly and document RTO/RPO.
-
Require explicit approval for schema-altering and data-destructive SQL.
Use operational checklists in:
- references/checklists.md
Change Workflow
-
Discovery: confirm env, targets, dependencies, maintenance window.
-
Plan: build exact staged command list and risk level.
-
Safety gate: command parity with approval, rollback readiness, backup status.
-
Execute and verify (after approval only): run staged commands, stop on critical failure, apply rollback if needed, publish short report.
Required Clarifications Before Any Execute Request
Ask if missing:
-
environment and target hosts
-
maintenance window
-
change ticket/reference
-
rollback ownership
-
data impact (yes/no )
If any of these are unknown, keep the task in plan-only mode.
Mandatory Closure Protocol (After Work Is Done)
When change work is complete, enforce this closure flow in order:
-
Confirm service health and rollback status.
-
Require user to rotate operational credentials.
-
Require removal of temporary access (keys/users/tokens).
-
Require post-change audit review.
-
Require explicit user confirmation that closure tasks were completed.
Do not mark work as fully closed until user confirms closure actions.
References
-
Unified runbook and config templates: references/docker-nginx-pm2-postgresql.md
-
Safety and incident checklists: references/checklists.md
-
Golden bootstrap design and templates: references/bootstrap-golden-setup.md