Canary Agent Safety Tripwire System
Safety monitoring and tripwire detection for AI agents. Protects against unauthorized file access, dangerous commands, and excessive activity. Auto-halts on critical violations. Honeypot tripwires detect snooping.
Safety monitoring and tripwire detection for AI agents.
Protects against unauthorized file access, dangerous commands, and excessive activity. Auto-halts on critical violations. Honeypot tripwires detect snooping.
What It Does
Canary provides three layers of agent safety:
- Action Monitoring - Checks file paths and commands before execution
- Tripwire Files - Honeypot files that should never be accessed
- Audit Trail - Complete logs and pattern detection
Core Features
Protected Paths:
- Block access to sensitive directories (
/etc/,~/.ssh/, etc.) - Customizable protection list
- Granular operation control (read, write, delete)
Forbidden Patterns:
- Regular expression matching for dangerous commands
- Detects
rm -rf /,chmod 777,curl | sh, etc. - Extensible pattern library
Rate Limiting:
- Limit file operations, network requests, command executions
- Configurable windows and thresholds
- Prevents runaway agents
Auto-Halt:
- Automatically stops agent after violation threshold
- Prevents cascading failures
- Requires manual review to restart
Tripwire Files:
- Create honeypot files that should never be accessed
- Detect modifications, deletions, or access
- Hash verification for file integrity
Audit Trail:
- Complete action logs
- Violation history
- Pattern detection (rapid violations, repeated targets, time clusters)
- Export to JSON or Markdown
Quick Start
Install
No dependencies! Python 3.7+ stdlib only.
# Copy config example
cp config_example.json config.json
# Edit config with your protected paths
nano config.json
Basic Usage
from canary import CanaryMonitor
# Initialize monitor
canary = CanaryMonitor('config.json')
# Check path before access
is_safe, reason = canary.check_path('/etc/passwd', 'read')
if not is_safe:
print(f"Blocked: {reason}")
exit(1)
# Check command before execution
is_safe, reason = canary.check_command('rm -rf /')
if not is_safe:
print(f"Blocked: {reason}")
exit(1)
# Get status
status = canary.get_status()
print(f"Violations: {status['violation_count']}/{status['halt_threshold']}")
CLI Usage
# Check status
python3 canary.py status
# Check if path is safe
python3 canary.py check-path --path /etc/passwd --operation read
# Check if command is safe
python3 canary.py check-command --command "rm -rf /"
# Reset monitoring (clears violations)
python3 canary.py reset
Tripwire Files
Create honeypot files that should never be accessed:
# Create tripwire
python3 canary_tripwire.py create \
--path ~/.secrets/fake-api-key.txt \
--severity critical \
--description "Honeypot to detect credential snooping"
# List all tripwires
python3 canary_tripwire.py list
# Check for triggered tripwires
python3 canary_tripwire.py check
# View alert history
python3 canary_tripwire.py alerts --limit 10
# Remove tripwire
python3 canary_tripwire.py remove --path ~/.secrets/fake-api-key.txt
Python API
from canary_tripwire import TripwireManager
manager = TripwireManager()
# Create tripwire
manager.create_tripwire(
path='~/.aws/fake-credentials',
severity='critical',
description='Detects AWS credential access'
)
# Check all tripwires
triggered = manager.check_tripwires()
if triggered:
print(f"⚠️ {len(triggered)} tripwire(s) triggered!")
for alert in triggered:
print(f" - {alert['path']}: {alert['event']}")
Audit Reports
Analyze logs and generate safety reports:
# Summary report
python3 canary_audit.py summary
# View violations by severity
python3 canary_audit.py violations --severity critical
# Timeline of recent events
python3 canary_audit.py timeline --hours 24
# Detect suspicious patterns
python3 canary_audit.py patterns
# Export full report
python3 canary_audit.py export --output report.json --format json
python3 canary_audit.py export --output report.md --format markdown
Python API
from canary_audit import CanaryAuditor
auditor = CanaryAuditor('canary.log')
# Generate summary
summary = auditor.generate_summary_report()
print(f"Total violations: {summary['total_violations']}")
# Get critical violations
critical = auditor.get_violations_by_severity('critical')
# Detect patterns
patterns = auditor.detect_patterns()
if patterns['rapid_violations']:
print("⚠️ Rapid violation sequence detected!")
# Export report
auditor.export_report('safety-report.md', format='markdown')
Configuration
See config_example.json for all options.
Essential Settings
{
"protected_paths": [
"/etc/",
"~/.ssh/",
"~/critical-data/"
],
"forbidden_patterns": [
"rm\\s+-rf\\s+/",
"chmod\\s+777",
"curl.*\\|\\s*sh"
],
"halt_threshold": 5,
"rate_limits": {
"file_operations": {"limit": 100, "window": 60},
"command_executions": {"limit": 20, "window": 60}
}
}
Integration Examples
With Agent Runtime
from canary import CanaryMonitor
canary = CanaryMonitor('config.json')
def safe_file_read(path):
"""Read file with Canary check."""
is_safe, reason = canary.check_path(path, 'read')
if not is_safe:
raise PermissionError(reason)
with open(path, 'r') as f:
return f.read()
def safe_command(cmd):
"""Execute command with Canary check."""
is_safe, reason = canary.check_command(cmd)
if not is_safe:
raise PermissionError(reason)
import subprocess
cmd_list = cmd.split() if isinstance(cmd, str) else cmd
return subprocess.run(cmd_list, capture_output=True)
Pre-Deployment Checks
# Before deploying agent, verify Canary setup
from canary import CanaryMonitor
canary = CanaryMonitor('config.json')
# Verify protected paths are configured
status = canary.get_status()
if status['protected_paths_count'] == 0:
print("⚠️ No protected paths configured!")
exit(1)
# Test tripwire detection
from canary_tripwire import TripwireManager
manager = TripwireManager()
# Create test tripwire
manager.create_tripwire('/tmp/canary-test.txt', severity='high')
# Verify it exists
triggered = manager.check_tripwires()
if not any(t['path'] == '/tmp/canary-test.txt' for t in triggered):
print("✅ Tripwire system operational")
# Cleanup
manager.remove_tripwire('/tmp/canary-test.txt', delete_file=True)
Use Cases
1. Autonomous Agent Safety
Deploy Canary alongside autonomous agents to prevent:
- Accidental system file deletion
- Credential exfiltration
- Runaway command execution
2. Multi-Agent Systems
Each agent gets its own Canary instance with custom rules:
- Research agent: limited network access
- Coding agent: no production deployments
- Admin agent: full access but strict audit
3. Development/Testing
Use Canary during agent development:
- Catch dangerous patterns early
- Test rate limiting behavior
- Verify safety mechanisms work
4. Production Monitoring
Run Canary in production:
- Real-time violation alerts
- Audit trail for compliance
- Pattern detection for anomalies
Architecture
┌─────────────────┐
│ Your Agent │
└────────┬────────┘
│
▼
┌─────────────────┐ ┌──────────────────┐
│ CanaryMonitor │◄────►│ config.py │
│ (canary.py) │ │ (your rules) │
└────────┬────────┘ └──────────────────┘
│
├─────► canary.log (action log)
│
▼
┌─────────────────┐ ┌──────────────────┐
│ TripwireManager │◄────►│ .canary_tripwires│
│ (tripwire.py) │ │ (honeypot files) │
└────────┬────────┘ └──────────────────┘
│
└─────► alerts.log
┌─────────────────┐
│ CanaryAuditor │───► reports (JSON/MD)
│ (audit.py) │
└─────────────────┘
Best Practices
Start Conservative
Begin with strict rules, relax as needed:
protected_paths = [
'/', # Protect entire filesystem initially
]
halt_threshold = 3 # Low threshold to catch issues early
Use Tripwires Strategically
Place tripwires in sensitive locations:
- Fake credential files
- Empty "secrets" directories
- Decoy config files
Review Logs Regularly
# Daily audit
python3 canary_audit.py summary
# Weekly deep dive
python3 canary_audit.py patterns
python3 canary_audit.py export --output weekly-report.md --format markdown
Test Your Configuration
# Verify Canary blocks what it should
canary = CanaryMonitor('config.json')
# These should all be blocked
assert not canary.check_path('/etc/passwd', 'delete')[0]
assert not canary.check_command('rm -rf /')[0]
assert not canary.check_command('chmod 777 /tmp')[0]
print("✅ Canary configuration verified")
Limitations
See LIMITATIONS.md for details.
Key constraints:
- Pattern matching is regex-based (not semantic analysis)
- No built-in alerting (logs only)
- Tripwires detect access, not intent
- Rate limiting is per-session (doesn't survive restarts)
License
MIT License - See LICENSE
Author: Shadow Rose
Why This Exists
AI agents can do a lot of damage quickly:
- One bad command can delete critical files
- Runaway loops can exhaust resources
- Compromised agents can exfiltrate credentials
Canary provides defense-in-depth:
- Preventive: Block dangerous actions before they happen
- Detective: Tripwires catch snooping behavior
- Forensic: Complete audit trail for post-incident analysis
Simple, zero-dependency safety for autonomous agents.
⚠️ Security Note — Config File
Configuration is loaded from a JSON file. This is safe to share — no code execution.
- Config path is validated for existence and size (1MB cap) before loading
- Must be a
.jsonfile —CanaryMonitorraisesValueErrorif given a non-JSON path - Keep your config under version control; treat it as security policy
⚠️ Security Note — Tripwire Deployment
- Paths are fully resolved —
~and relative paths are expanded viaPath.expanduser().resolve()before creation and lookup.'~/.aws/fake-credentials'will be placed in your actual home directory, not a literal~path. - Use decoy paths only — never point tripwires at real files containing sensitive data. Tripwires are honeypots; treat them as bait, not protection.
create_tripwirewill not overwrite existing files — it checks for pre-existing files and refuses to proceed. Use dedicated empty paths for tripwires.- Test in a sandbox first — verify where logs, tripwires, and registry files are created before deploying. Confirm protected paths and auto-halt behavior in an isolated environment.
- Protect log and alert directories — set filesystem permissions so alert logs are not world-readable. Canary writes plaintext logs; restrict access accordingly.
- Canary only blocks when called — it is not an OS-level enforcement mechanism. Layer it with containers, filesystem permissions, and
auditdfor production deployments.
⚠️ Disclaimer
This software is provided "AS IS", without warranty of any kind, express or implied.
USE AT YOUR OWN RISK.
- The author(s) are NOT liable for any damages, losses, or consequences arising from the use or misuse of this software — including but not limited to financial loss, data loss, security breaches, business interruption, or any indirect/consequential damages.
- This software does NOT constitute financial, legal, trading, or professional advice.
- Users are solely responsible for evaluating whether this software is suitable for their use case, environment, and risk tolerance.
- No guarantee is made regarding accuracy, reliability, completeness, or fitness for any particular purpose.
- The author(s) are not responsible for how third parties use, modify, or distribute this software after purchase.
By downloading, installing, or using this software, you acknowledge that you have read this disclaimer and agree to use the software entirely at your own risk.
SECURITY DISCLAIMER: This software provides supplementary security measures and is NOT a replacement for professional security auditing, penetration testing, or compliance frameworks. No software can guarantee complete protection against all threats. Users operating in regulated industries (healthcare, finance, legal) should consult qualified security professionals and verify compliance with applicable regulations (GDPR, HIPAA, SOC2, etc.) independently.
Support & Links
| 🐛 Bug Reports | TheShadowyRose@proton.me |
| ☕ Ko-fi | ko-fi.com/theshadowrose |
| 🛒 Gumroad | shadowyrose.gumroad.com |
| @TheShadowyRose | |
| 🐙 GitHub | github.com/TheShadowRose |
| 🧠 PromptBase | promptbase.com/profile/shadowrose |
Built with OpenClaw — thank you for making this possible.
🛠️ Need something custom? Custom OpenClaw agents & skills starting at $500. If you can describe it, I can build it. → Hire me on Fiverr
📦 Install note: The slug
canarywas already taken on ClawHub. Install this skill using:clawhub install canary-sr