guard

Deep AI safety guardrails workflow—policy definition, input/output filtering, monitoring, escalation, and false-positive handling. Use when reducing harmful outputs, misuse, or policy violations in LLM products.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "guard" with this command: npx skills add clawkk/guard

AI Guardrails (Deep Workflow)

Guardrails turn product and legal policy into enforced behavior: blocking, rewriting, logging, and human review—with attention to false positives and latency.

When to Offer This Workflow

Trigger conditions:

  • Launching consumer-facing LLM features
  • Jailbreak attempts, policy violations, or PII leakage risks
  • Region-specific compliance (minors, regulated advice)

Initial offer:

Use six stages: (1) policy scope, (2) threat model, (3) controls stack, (4) implementation patterns, (5) monitoring & review, (6) iteration & appeals). Confirm latency budget and jurisdictions.


Stage 1: Policy Scope

Goal: Define prohibited categories (hate, sexual content, violence, self-harm, malware instructions, etc.) and required disclaimers for sensitive domains (medical, legal).

Exit condition: Policy document owned by legal/product; escalation path for gray areas.


Stage 2: Threat Model

Goal: Identify adversaries (prompt injection, data exfiltration, tool abuse) and assets (user data, system prompts, connectors).


Stage 3: Controls Stack

Goal: Layer defenses: input screening, model safety APIs, output classifiers, tool sandboxing, allowlists for tools and URLs.


Stage 4: Implementation Patterns

Goal: Structured refusal messages; telemetry on every block; distinguish block vs rewrite vs warn; avoid silent failures.


Stage 5: Monitoring & Review

Goal: Sample borderline cases for human review; dashboards on block rates by category; abuse spike alerts.


Stage 6: Iteration & Appeals

Goal: User appeals path where appropriate; version policy changes; measure false positives by locale and use case.


Final Review Checklist

  • Policy categories and owners defined
  • Threat model aligned with product
  • Layered controls with clear responsibilities
  • Telemetry and review for edge cases
  • Appeals and iteration process where applicable

Tips for Effective Guidance

  • Defense in depth—no single classifier is sufficient.
  • Pair with moderation for UGC and tool-calling for agent safety.

Handling Deviations

  • Enterprise internal bots: emphasize data-leak prevention and connector scope over public “safety” categories alone.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Web3

PayAll CLI

Operate the Payall crypto card CLI tool. Use this skill whenever the user wants to: manage crypto debit cards, check card balances, apply for new cards, comp...

Registry SourceRecently Updated
Web3

Okr Progress Tracker

Track and evaluate OKR (Objectives and Key Results) progress by parsing OKR definitions from markdown, YAML, or JSON files. Scores key results, calculates ob...

Registry SourceRecently Updated
40Profile unavailable
Web3

Alipay Wallet

提供支付宝资产管理、卡包及收支分析指引。

Registry SourceRecently Updated
1540Profile unavailable
Web3

[1m-trade] AI Autonomous Trading

Integrated on-chain operations hub: integrates BlockBeats market intelligence, Hyperliquid DEX trading via `hl1m`, wallet creation and management at https://...

Registry SourceRecently Updated
3770Profile unavailable