logging and monitoring for agentic workflows

πŸ“Š Logging and Monitoring for Agentic Workflows

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "logging and monitoring for agentic workflows" with this command: npx skills add hack23/riksdagsmonitor/hack23-riksdagsmonitor-logging-and-monitoring-for-agentic-workflows

πŸ“Š Logging and Monitoring for Agentic Workflows

πŸ“‹ Overview

This skill provides comprehensive patterns for implementing observability in GitHub Agentic Workflows. It covers structured logging architectures, metrics collection, alerting strategies, debugging techniques, and production monitoring best practices for autonomous agent systems.

🎯 Core Concepts

Observability Architecture

graph TB subgraph "Agent Execution" A[Agent Start] --> B[Task Execution] B --> C[MCP Operations] C --> D[Agent Completion] end

subgraph "Logging Layer"
    B --> E[Structured Logs]
    C --> E
    D --> E
    E --> F[Log Aggregation]
end

subgraph "Metrics Layer"
    B --> G[Performance Metrics]
    C --> G
    D --> G
    G --> H[Time Series DB]
end

subgraph "Alerting Layer"
    F --> I[Alert Rules]
    H --> I
    I --> J[Notifications]
end

subgraph "Visualization"
    F --> K[Log Explorer]
    H --> L[Dashboards]
    K --> M[Insights]
    L --> M
end

style E fill:#00d9ff
style G fill:#ff006e
style I fill:#ffbe0b

Three Pillars of Observability

  • Logs: Detailed event records with context

  • Metrics: Quantitative measurements over time

  • Traces: Request flow through system components

πŸ“ Structured Logging

  1. Logging Architecture

JSON Structured Logging Format

// scripts/agents/lib/logger.js import winston from 'winston'; import { v4 as uuidv4 } from 'uuid';

/**

  • Structured logger for agentic workflows

  • Outputs JSON logs with consistent schema */ class AgentLogger { constructor(options = {}) { this.agentId = options.agentId || process.env.AGENT_ID || uuidv4(); this.sessionId = options.sessionId || uuidv4(); this.environment = process.env.NODE_ENV || 'development';

    this.logger = winston.createLogger({ level: options.level || process.env.LOG_LEVEL || 'info', format: winston.format.combine( winston.format.timestamp({ format: 'ISO' }), winston.format.errors({ stack: true }), winston.format.json() ), defaultMeta: { agent_id: this.agentId, session_id: this.sessionId, environment: this.environment, service: 'agentic-workflow', version: process.env.APP_VERSION || '1.0.0' }, transports: [ // Console output (GitHub Actions) new winston.transports.Console({ format: winston.format.combine( winston.format.colorize(), winston.format.printf(this.formatConsoleOutput.bind(this)) ) }), // File output (local development) new winston.transports.File({ filename: 'logs/agent-errors.log', level: 'error', maxsize: 10485760, // 10MB maxFiles: 5, tailable: true }), new winston.transports.File({ filename: 'logs/agent-combined.log', maxsize: 10485760, maxFiles: 10, tailable: true }) ], exitOnError: false }); }

/**

  • Format console output for human readability */ formatConsoleOutput(info) { const { timestamp, level, message, ...meta } = info; const metaStr = Object.keys(meta).length ? JSON.stringify(meta, null, 2) : ''; return ${timestamp} [${level}] ${message}${metaStr ? '\n' + metaStr : ''}; }

/**

  • Log agent lifecycle events */ logAgentStart(taskType, context = {}) { this.logger.info('Agent execution started', { event_type: 'agent_start', task_type: taskType, ...context }); }

logAgentComplete(taskType, result, metrics = {}) { this.logger.info('Agent execution completed', { event_type: 'agent_complete', task_type: taskType, status: result.status, duration_ms: metrics.duration, tokens_used: metrics.tokens, ...result }); }

logAgentError(taskType, error, context = {}) { this.logger.error('Agent execution failed', { event_type: 'agent_error', task_type: taskType, error_message: error.message, error_stack: error.stack, error_code: error.code, ...context }); }

/**

  • Log MCP operations */ logMCPCall(toolName, params, context = {}) { this.logger.debug('MCP tool call', { event_type: 'mcp_call', tool_name: toolName, params: this.sanitizeParams(params), ...context }); }

logMCPResponse(toolName, response, duration) { this.logger.debug('MCP tool response', { event_type: 'mcp_response', tool_name: toolName, duration_ms: duration, response_size: JSON.stringify(response).length, status: 'success' }); }

logMCPError(toolName, error, duration) { this.logger.error('MCP tool error', { event_type: 'mcp_error', tool_name: toolName, duration_ms: duration, error_message: error.message, error_code: error.code }); }

/**

  • Log performance metrics */ logPerformance(operation, metrics) { this.logger.info('Performance metrics', { event_type: 'performance', operation, duration_ms: metrics.duration, memory_mb: metrics.memory, cpu_percent: metrics.cpu }); }

/**

  • Log security events */ logSecurityEvent(eventType, details) { this.logger.warn('Security event', { event_type: 'security', security_event: eventType, ...details }); }

/**

  • Sanitize sensitive data from logs */ sanitizeParams(params) { const sensitive = ['token', 'password', 'secret', 'key', 'api_key']; const sanitized = { ...params };
for (const key of Object.keys(sanitized)) {
  if (sensitive.some(s => key.toLowerCase().includes(s))) {
    sanitized[key] = '***REDACTED***';
  }
}

return sanitized;

}

/**

  • Create child logger with additional context */ child(metadata) { const childLogger = Object.create(this); childLogger.logger = this.logger.child(metadata); return childLogger; } }

export default AgentLogger;

// Usage example const logger = new AgentLogger({ agentId: 'pr-analyzer', sessionId: process.env.GITHUB_RUN_ID });

logger.logAgentStart('pr-analysis', { pr_number: 123, repository: 'owner/repo' });

Python Structured Logging

scripts/agents/lib/logger.py

import logging import json import sys import os from datetime import datetime from typing import Any, Dict, Optional import uuid

class JSONFormatter(logging.Formatter): """ JSON formatter for structured logging """ def init(self): super().init() self.agent_id = os.getenv('AGENT_ID', str(uuid.uuid4())) self.session_id = os.getenv('GITHUB_RUN_ID', str(uuid.uuid4())) self.environment = os.getenv('ENVIRONMENT', 'development')

def format(self, record: logging.LogRecord) -> str:
    """Format log record as JSON"""
    log_data = {
        'timestamp': datetime.utcnow().isoformat() + 'Z',
        'level': record.levelname,
        'message': record.getMessage(),
        'agent_id': self.agent_id,
        'session_id': self.session_id,
        'environment': self.environment,
        'service': 'agentic-workflow',
        'logger': record.name,
        'module': record.module,
        'function': record.funcName,
        'line': record.lineno
    }
    
    # Add exception info if present
    if record.exc_info:
        log_data['exception'] = {
            'type': record.exc_info[0].__name__,
            'message': str(record.exc_info[1]),
            'traceback': self.formatException(record.exc_info)
        }
        
    # Add extra fields
    if hasattr(record, 'extra_fields'):
        log_data.update(record.extra_fields)
        
    return json.dumps(log_data)

class AgentLogger: """ Structured logger for agentic workflows """ def init(self, name: str, level: str = 'INFO'): self.logger = logging.getLogger(name) self.logger.setLevel(getattr(logging, level.upper()))

    # Console handler with JSON format
    console_handler = logging.StreamHandler(sys.stdout)
    console_handler.setFormatter(JSONFormatter())
    self.logger.addHandler(console_handler)
    
    # File handler for errors
    if not os.path.exists('logs'):
        os.makedirs('logs')
        
    error_handler = logging.FileHandler('logs/agent-errors.log')
    error_handler.setLevel(logging.ERROR)
    error_handler.setFormatter(JSONFormatter())
    self.logger.addHandler(error_handler)
    
def log_agent_start(self, task_type: str, context: Dict[str, Any] = None):
    """Log agent start event"""
    extra = {
        'extra_fields': {
            'event_type': 'agent_start',
            'task_type': task_type,
            **(context or {})
        }
    }
    self.logger.info('Agent execution started', extra=extra)
    
def log_agent_complete(self, task_type: str, result: Dict[str, Any], metrics: Dict[str, Any] = None):
    """Log agent completion event"""
    extra = {
        'extra_fields': {
            'event_type': 'agent_complete',
            'task_type': task_type,
            'status': result.get('status'),
            'duration_ms': metrics.get('duration') if metrics else None,
            **result
        }
    }
    self.logger.info('Agent execution completed', extra=extra)
    
def log_agent_error(self, task_type: str, error: Exception, context: Dict[str, Any] = None):
    """Log agent error event"""
    extra = {
        'extra_fields': {
            'event_type': 'agent_error',
            'task_type': task_type,
            'error_type': type(error).__name__,
            'error_message': str(error),
            **(context or {})
        }
    }
    self.logger.error('Agent execution failed', extra=extra, exc_info=error)
    
def log_mcp_call(self, tool_name: str, params: Dict[str, Any], context: Dict[str, Any] = None):
    """Log MCP tool call"""
    extra = {
        'extra_fields': {
            'event_type': 'mcp_call',
            'tool_name': tool_name,
            'params': self._sanitize_params(params),
            **(context or {})
        }
    }
    self.logger.debug('MCP tool call', extra=extra)
    
def _sanitize_params(self, params: Dict[str, Any]) -> Dict[str, Any]:
    """Sanitize sensitive data"""
    sensitive = ['token', 'password', 'secret', 'key', 'api_key']
    sanitized = params.copy()
    
    for key in sanitized.keys():
        if any(s in key.lower() for s in sensitive):
            sanitized[key] = '***REDACTED***'
            
    return sanitized

Usage

logger = AgentLogger('pr-analyzer', level='INFO') logger.log_agent_start('pr-analysis', {'pr_number': 123})

  1. Log Levels and When to Use Them

Log Level Guidelines

// ERROR: System errors requiring immediate attention logger.error('Failed to connect to MCP server', { error: error.message, mcp_server: 'github', retry_count: 3 });

// WARN: Unexpected conditions that don't prevent operation logger.warn('MCP response slow', { duration_ms: 5000, threshold_ms: 3000, tool_name: 'github-search' });

// INFO: Important business events and milestones logger.info('PR analysis completed', { pr_number: 123, issues_found: 5, duration_ms: 15000 });

// DEBUG: Detailed information for debugging logger.debug('Processing file', { file_path: 'src/index.js', file_size: 1024, line_count: 50 });

// TRACE: Very detailed diagnostic information (if supported) logger.trace('Token consumed', { token_count: 1000, model: 'claude-3-5-sonnet', prompt_type: 'analysis' });

  1. Log Aggregation Patterns

GitHub Actions Log Groups

Group related logs in GitHub Actions

steps:

  • name: Run Agent Analysis run: | echo "::group::Agent Initialization" node scripts/agents/pr-analyzer.js --phase=init echo "::endgroup::"

    echo "::group::MCP Server Connection" node scripts/agents/pr-analyzer.js --phase=connect echo "::endgroup::"

    echo "::group::Analysis Execution" node scripts/agents/pr-analyzer.js --phase=analyze echo "::endgroup::"

    echo "::group::Report Generation" node scripts/agents/pr-analyzer.js --phase=report echo "::endgroup::"

Centralized Log Collection

// scripts/agents/lib/log-collector.js import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3'; import { gunzipSync, gzipSync } from 'zlib';

/**

  • Collect and upload logs to centralized storage */ class LogCollector { constructor(options = {}) { this.s3Client = new S3Client({ region: options.region || 'us-east-1' }); this.bucket = options.bucket || 'agentic-workflow-logs'; this.compression = options.compression !== false; }

/**

  • Upload logs to S3 */ async uploadLogs(logFile, metadata = {}) { const content = await fs.readFile(logFile, 'utf8'); const logs = content.split('\n').filter(Boolean).map(JSON.parse);
const key = this.generateLogKey(metadata);
const body = this.compression ? gzipSync(JSON.stringify(logs)) : JSON.stringify(logs);

await this.s3Client.send(new PutObjectCommand({
  Bucket: this.bucket,
  Key: key,
  Body: body,
  ContentType: 'application/json',
  ContentEncoding: this.compression ? 'gzip' : undefined,
  Metadata: {
    agent_id: metadata.agentId,
    session_id: metadata.sessionId,
    timestamp: new Date().toISOString()
  }
}));

console.log(`βœ… Logs uploaded to s3://${this.bucket}/${key}`);

}

/**

  • Generate S3 key with partitioning */ generateLogKey(metadata) { const date = new Date(); const year = date.getUTCFullYear(); const month = String(date.getUTCMonth() + 1).padStart(2, '0'); const day = String(date.getUTCDate()).padStart(2, '0');
return [
  'logs',
  `year=${year}`,
  `month=${month}`,
  `day=${day}`,
  `agent=${metadata.agentId}`,
  `${metadata.sessionId}.json${this.compression ? '.gz' : ''}`
].join('/');

}

/**

  • Query logs from S3 */ async queryLogs(filters = {}) { // Use S3 Select for efficient querying const params = { Bucket: this.bucket, Key: filters.key, ExpressionType: 'SQL', Expression: this.buildSQLQuery(filters), InputSerialization: { JSON: { Type: 'LINES' }, CompressionType: this.compression ? 'GZIP' : 'NONE' }, OutputSerialization: { JSON: { RecordDelimiter: '\n' } } };
// Implementation details...

} }

export default LogCollector;

πŸ“ˆ Metrics Collection

  1. Performance Metrics

Agent Execution Metrics

// scripts/agents/lib/metrics.js import { performance } from 'perf_hooks'; import os from 'os';

/**

  • Metrics collector for agentic workflows */ class MetricsCollector { constructor() { this.metrics = new Map(); this.startTime = performance.now(); }

/**

  • Record execution time */ recordDuration(operation, duration) { this.recordMetric('duration_ms', operation, duration, 'histogram'); }

/**

  • Record counter metric */ recordCounter(name, value = 1, labels = {}) { this.recordMetric(name, JSON.stringify(labels), value, 'counter'); }

/**

  • Record gauge metric */ recordGauge(name, value, labels = {}) { this.recordMetric(name, JSON.stringify(labels), value, 'gauge'); }

/**

  • Store metric value */ recordMetric(name, key, value, type) { const metricKey = ${name}:${key};
if (!this.metrics.has(metricKey)) {
  this.metrics.set(metricKey, {
    name,
    type,
    values: [],
    labels: key !== 'default' ? JSON.parse(key) : {}
  });
}

this.metrics.get(metricKey).values.push({
  value,
  timestamp: Date.now()
});

}

/**

  • Calculate system metrics */ getSystemMetrics() { const used = process.memoryUsage(); const cpus = os.cpus();
return {
  memory: {
    heap_used_mb: Math.round(used.heapUsed / 1024 / 1024),
    heap_total_mb: Math.round(used.heapTotal / 1024 / 1024),
    rss_mb: Math.round(used.rss / 1024 / 1024),
    external_mb: Math.round(used.external / 1024 / 1024)
  },
  cpu: {
    count: cpus.length,
    model: cpus[0].model,
    load_avg: os.loadavg()
  },
  uptime_seconds: Math.round(process.uptime())
};

}

/**

  • Calculate agent-specific metrics */ getAgentMetrics() { const duration = performance.now() - this.startTime;
return {
  total_duration_ms: Math.round(duration),
  operations_count: this.getOperationCount(),
  errors_count: this.getErrorCount(),
  success_rate: this.calculateSuccessRate()
};

}

/**

  • Export metrics in Prometheus format */ exportPrometheus() { const lines = [];
for (const [key, metric] of this.metrics.entries()) {
  const { name, type, values, labels } = metric;
  
  // Metric help and type
  lines.push(`# HELP ${name} Agent workflow metric`);
  lines.push(`# TYPE ${name} ${type}`);
  
  // Metric values
  const labelStr = Object.entries(labels)
    .map(([k, v]) => `${k}="${v}"`)
    .join(',');
    
  if (type === 'histogram') {
    const sorted = values.map(v => v.value).sort((a, b) => a - b);
    lines.push(`${name}_sum{${labelStr}} ${sorted.reduce((a, b) => a + b, 0)}`);
    lines.push(`${name}_count{${labelStr}} ${sorted.length}`);
    lines.push(`${name}_bucket{${labelStr},le="100"} ${sorted.filter(v => v <= 100).length}`);
    lines.push(`${name}_bucket{${labelStr},le="500"} ${sorted.filter(v => v <= 500).length}`);
    lines.push(`${name}_bucket{${labelStr},le="1000"} ${sorted.filter(v => v <= 1000).length}`);
    lines.push(`${name}_bucket{${labelStr},le="+Inf"} ${sorted.length}`);
  } else if (type === 'counter') {
    const sum = values.reduce((acc, v) => acc + v.value, 0);
    lines.push(`${name}{${labelStr}} ${sum}`);
  } else if (type === 'gauge') {
    const latest = values[values.length - 1];
    lines.push(`${name}{${labelStr}} ${latest.value}`);
  }
}

return lines.join('\n');

}

/**

  • Export metrics as JSON */ exportJSON() { const metrics = { timestamp: new Date().toISOString(), agent: this.getAgentMetrics(), system: this.getSystemMetrics(), custom: {} };
for (const [key, metric] of this.metrics.entries()) {
  metrics.custom[metric.name] = {
    type: metric.type,
    labels: metric.labels,
    values: metric.values
  };
}

return metrics;

} }

export default MetricsCollector;

// Usage const metrics = new MetricsCollector();

// Record operation duration const start = performance.now(); await performOperation(); metrics.recordDuration('pr_analysis', performance.now() - start);

// Record counter metrics.recordCounter('files_analyzed', 1, { language: 'javascript' });

// Record gauge metrics.recordGauge('active_mcp_connections', 3);

// Export metrics console.log(metrics.exportPrometheus());

  1. Custom Metrics for Agents

// Agent-specific metrics class AgentMetrics extends MetricsCollector { /**

  • Track LLM API usage */ recordLLMCall(model, tokens, duration) { this.recordCounter('llm_calls_total', 1, { model }); this.recordCounter('llm_tokens_total', tokens, { model }); this.recordDuration('llm_duration', duration); }

/**

  • Track MCP tool usage */ recordMCPTool(toolName, duration, success) { this.recordCounter('mcp_calls_total', 1, { tool: toolName, success: success.toString() }); this.recordDuration('mcp_duration', duration); }

/**

  • Track agent decisions */ recordDecision(decisionType, confidence) { this.recordCounter('agent_decisions_total', 1, { type: decisionType }); this.recordGauge('agent_confidence', confidence, { type: decisionType }); }

/**

  • Track code changes */ recordCodeChanges(filesModified, linesAdded, linesRemoved) { this.recordCounter('files_modified_total', filesModified); this.recordCounter('lines_added_total', linesAdded); this.recordCounter('lines_removed_total', linesRemoved); }

/**

  • Calculate cost metrics */ calculateCost(model, tokens) { const costs = { 'claude-3-5-sonnet': 0.003, // per 1k tokens 'gpt-4-turbo': 0.01, 'gpt-3.5-turbo': 0.0015 };
const costPerToken = costs[model] || 0;
const cost = (tokens / 1000) * costPerToken;

this.recordGauge('agent_cost_usd', cost, { model });

return cost;

} }

  1. Metrics Visualization

GitHub Actions Job Summary

// Generate metrics summary for GitHub Actions async function generateMetricsSummary(metrics) { const agentMetrics = metrics.getAgentMetrics(); const systemMetrics = metrics.getSystemMetrics();

await core.summary .addHeading('πŸ“Š Agent Execution Metrics') .addTable([ [{data: 'Metric', header: true}, {data: 'Value', header: true}], ['Total Duration', ${agentMetrics.total_duration_ms}ms], ['Operations', agentMetrics.operations_count.toString()], ['Errors', agentMetrics.errors_count.toString()], ['Success Rate', ${agentMetrics.success_rate}%] ]) .addHeading('πŸ’» System Metrics', 3) .addTable([ [{data: 'Resource', header: true}, {data: 'Usage', header: true}], ['Memory (Heap)', ${systemMetrics.memory.heap_used_mb}MB / ${systemMetrics.memory.heap_total_mb}MB], ['Memory (RSS)', ${systemMetrics.memory.rss_mb}MB], ['CPU Count', systemMetrics.cpu.count.toString()], ['Uptime', ${systemMetrics.uptime_seconds}s] ]) .write(); }

🚨 Alerting Strategies

  1. Alert Rules Configuration

.github/workflows/alert-rules.yml

Monitor agent execution and trigger alerts

name: Agent Monitoring & Alerts

on: schedule: - cron: '*/15 * * * *' # Every 15 minutes workflow_dispatch:

permissions: contents: read issues: write

jobs: check-metrics: name: Check Agent Metrics runs-on: ubuntu-latest

steps:
  - name: Fetch Recent Workflow Runs
    id: fetch-runs
    uses: actions/github-script@v7
    with:
      github-token: ${{ secrets.GITHUB_TOKEN }}
      script: |
        const { data: runs } = await github.rest.actions.listWorkflowRunsForRepo({
          owner: context.repo.owner,
          repo: context.repo.repo,
          status: 'completed',
          per_page: 50
        });
        
        // Filter agent workflows
        const agentRuns = runs.workflow_runs.filter(run => 
          run.name.includes('Agent') || run.name.includes('Agentic')
        );
        
        return agentRuns;
        
  - name: Calculate Metrics
    id: metrics
    uses: actions/github-script@v7
    with:
      github-token: ${{ secrets.GITHUB_TOKEN }}
      script: |
        const runs = ${{ steps.fetch-runs.outputs.result }};
        
        const total = runs.length;
        const failed = runs.filter(r => r.conclusion === 'failure').length;
        const successRate = total > 0 ? ((total - failed) / total) * 100 : 100;
        
        const durations = runs.map(r => {
          const start = new Date(r.created_at);
          const end = new Date(r.updated_at);
          return (end - start) / 1000;
        });
        
        const avgDuration = durations.reduce((a, b) => a + b, 0) / durations.length;
        const p95Duration = durations.sort((a, b) => a - b)[Math.floor(durations.length * 0.95)];
        
        return {
          total,
          failed,
          successRate,
          avgDuration,
          p95Duration
        };
        
  - name: Check Alert Conditions
    id: check-alerts
    uses: actions/github-script@v7
    with:
      github-token: ${{ secrets.GITHUB_TOKEN }}
      script: |
        const metrics = ${{ steps.metrics.outputs.result }};
        const alerts = [];
        
        // Alert if success rate below 90%
        if (metrics.successRate < 90) {
          alerts.push({
            severity: 'high',
            title: 'Low Agent Success Rate',
            message: `Success rate: ${metrics.successRate.toFixed(1)}% (threshold: 90%)`,
            metric: 'success_rate',
            value: metrics.successRate
          });
        }
        
        // Alert if p95 duration above 10 minutes
        if (metrics.p95Duration > 600) {
          alerts.push({
            severity: 'medium',
            title: 'High Agent Latency',
            message: `P95 duration: ${Math.round(metrics.p95Duration)}s (threshold: 600s)`,
            metric: 'p95_duration',
            value: metrics.p95Duration
          });
        }
        
        // Alert if too many failures
        if (metrics.failed > 5) {
          alerts.push({
            severity: 'high',
            title: 'Multiple Agent Failures',
            message: `${metrics.failed} failures in last 50 runs`,
            metric: 'failure_count',
            value: metrics.failed
          });
        }
        
        return alerts;
        
  - name: Create Alert Issues
    if: steps.check-alerts.outputs.result != '[]'
    uses: actions/github-script@v7
    with:
      github-token: ${{ secrets.GITHUB_TOKEN }}
      script: |
        const alerts = ${{ steps.check-alerts.outputs.result }};
        
        for (const alert of alerts) {
          // Check if alert already exists
          const { data: existingIssues } = await github.rest.issues.listForRepo({
            owner: context.repo.owner,
            repo: context.repo.repo,
            state: 'open',
            labels: 'alert,monitoring',
            per_page: 100
          });
          
          const exists = existingIssues.some(issue => 
            issue.title === `🚨 ${alert.title}`
          );
          
          if (!exists) {
            await github.rest.issues.create({
              owner: context.repo.owner,
              repo: context.repo.repo,
              title: `🚨 ${alert.title}`,
              body: `## Alert Details
              

Severity: ${alert.severity.toUpperCase()} Metric: ${alert.metric} Value: ${alert.value}

${alert.message}


Auto-generated alert from agent monitoring Timestamp: ${new Date().toISOString()}, labels: ['alert', 'monitoring', severity:${alert.severity}`] }); } }

  - name: Send Slack Notification
    if: steps.check-alerts.outputs.result != '[]'
    env:
      SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
    run: |
      alerts='${{ steps.check-alerts.outputs.result }}'
      
      curl -X POST "$SLACK_WEBHOOK_URL" \
        -H 'Content-Type: application/json' \
        -d "{
          \"text\": \"🚨 Agent Monitoring Alert\",
          \"blocks\": [
            {
              \"type\": \"header\",
              \"text\": {
                \"type\": \"plain_text\",
                \"text\": \"🚨 Agent Monitoring Alert\"
              }
            },
            {
              \"type\": \"section\",
              \"text\": {
                \"type\": \"mrkdwn\",
                \"text\": \"$alerts\"
              }
            }
          ]
        }"

2. Real-Time Alerting

// scripts/agents/lib/alerting.js import { WebClient } from '@slack/web-api'; import nodemailer from 'nodemailer';

/**

  • Alert manager for agentic workflows */ class AlertManager { constructor(options = {}) { this.slackClient = options.slackWebhook ? new WebClient(options.slackToken) : null; this.slackChannel = options.slackChannel; this.emailTransporter = options.emailConfig ? nodemailer.createTransporter(options.emailConfig) : null; this.thresholds = options.thresholds || this.getDefaultThresholds(); }

getDefaultThresholds() { return { duration: { warning: 300000, // 5 minutes critical: 600000 // 10 minutes }, errorRate: { warning: 0.05, // 5% critical: 0.10 // 10% }, memory: { warning: 512, // 512MB critical: 1024 // 1GB } }; }

/**

  • Check if metric exceeds threshold */ checkThreshold(metric, value) { const threshold = this.thresholds[metric]; if (!threshold) return null;
if (value >= threshold.critical) {
  return { level: 'critical', threshold: threshold.critical };
} else if (value >= threshold.warning) {
  return { level: 'warning', threshold: threshold.warning };
}

return null;

}

/**

  • Send alert to configured channels */ async sendAlert(alert) { const promises = [];
if (this.slackClient) {
  promises.push(this.sendSlackAlert(alert));
}

if (this.emailTransporter) {
  promises.push(this.sendEmailAlert(alert));
}

await Promise.allSettled(promises);

}

/**

  • Send Slack alert */ async sendSlackAlert(alert) { const emoji = alert.level === 'critical' ? 'πŸ”΄' : '⚠️'; const color = alert.level === 'critical' ? 'danger' : 'warning';
await this.slackClient.chat.postMessage({
  channel: this.slackChannel,
  text: `${emoji} ${alert.title}`,
  blocks: [
    {
      type: 'header',
      text: {
        type: 'plain_text',
        text: `${emoji} ${alert.title}`
      }
    },
    {
      type: 'section',
      fields: [
        {
          type: 'mrkdwn',
          text: `*Severity:*\n${alert.level.toUpperCase()}`
        },
        {
          type: 'mrkdwn',
          text: `*Metric:*\n${alert.metric}`
        },
        {
          type: 'mrkdwn',
          text: `*Value:*\n${alert.value}`
        },
        {
          type: 'mrkdwn',
          text: `*Threshold:*\n${alert.threshold}`
        }
      ]
    },
    {
      type: 'section',
      text: {
        type: 'mrkdwn',
        text: alert.message
      }
    },
    {
      type: 'context',
      elements: [
        {
          type: 'mrkdwn',
          text: `Agent: ${alert.agentId} | Session: ${alert.sessionId}`
        }
      ]
    }
  ],
  attachments: [
    {
      color: color,
      text: `View logs: ${alert.logsUrl}`
    }
  ]
});

}

/**

  • Send email alert */ async sendEmailAlert(alert) { await this.emailTransporter.sendMail({ from: 'alerts@example.com', to: 'team@example.com', subject: [${alert.level.toUpperCase()}] ${alert.title}, html: <h2>${alert.level === 'critical' ? 'πŸ”΄' : '⚠️'} ${alert.title}</h2> <table> <tr><th>Severity</th><td>${alert.level.toUpperCase()}</td></tr> <tr><th>Metric</th><td>${alert.metric}</td></tr> <tr><th>Value</th><td>${alert.value}</td></tr> <tr><th>Threshold</th><td>${alert.threshold}</td></tr> </table> <p>${alert.message}</p> <p><a href="${alert.logsUrl}">View Logs</a></p> }); } }

export default AlertManager;

πŸ› Debugging Techniques

  1. Debug Mode

// Enable debug mode for detailed logging process.env.DEBUG = 'agent:,mcp:'; process.env.LOG_LEVEL = 'debug';

// Use debug module import createDebug from 'debug';

const debug = createDebug('agent:pr-analyzer'); const debugMCP = createDebug('mcp:github');

debug('Starting PR analysis for #%d', prNumber); debugMCP('Calling tool %s with params %o', toolName, params);

  1. Interactive Debugging

Enable tmate for interactive debugging

steps:

  • name: Setup tmate session if: failure() # Only on failure uses: mxschmitt/action-tmate@v3 with: limit-access-to-actor: true timeout-minutes: 30
  1. Trace Analysis

// scripts/agents/lib/tracer.js import { trace, context, SpanStatusCode } from '@opentelemetry/api';

/**

  • OpenTelemetry tracer for agentic workflows */ class AgentTracer { constructor(serviceName = 'agentic-workflow') { this.tracer = trace.getTracer(serviceName); }

/**

  • Create span for operation */ async traceOperation(name, fn, attributes = {}) { const span = this.tracer.startSpan(name, { attributes: { 'agent.service': 'agentic-workflow', ...attributes } });
try {
  const result = await context.with(
    trace.setSpan(context.active(), span),
    fn
  );
  
  span.setStatus({ code: SpanStatusCode.OK });
  return result;
} catch (error) {
  span.setStatus({
    code: SpanStatusCode.ERROR,
    message: error.message
  });
  span.recordException(error);
  throw error;
} finally {
  span.end();
}

}

/**

  • Trace MCP tool call */ async traceMCPCall(toolName, params, fn) { return this.traceOperation( mcp.${toolName}, fn, { 'mcp.tool': toolName, 'mcp.params': JSON.stringify(params) } ); } }

// Usage const tracer = new AgentTracer();

await tracer.traceOperation('pr-analysis', async () => { // Operation code await tracer.traceMCPCall('github-get-pr', { prNumber }, async () => { // MCP call }); });

πŸ“Š Observability Best Practices

  1. Correlation IDs

// Generate and propagate correlation IDs import { v4 as uuidv4 } from 'uuid';

class CorrelationContext { constructor() { this.correlationId = uuidv4(); this.parentId = null; }

createChild() { const child = new CorrelationContext(); child.parentId = this.correlationId; return child; }

getHeaders() { return { 'X-Correlation-ID': this.correlationId, 'X-Parent-ID': this.parentId }; } }

// Use in all logs and API calls const ctx = new CorrelationContext(); logger.info('Starting operation', { correlation_id: ctx.correlationId, parent_id: ctx.parentId });

  1. Health Checks

// scripts/agents/health-check.js export async function checkHealth() { const health = { status: 'healthy', timestamp: new Date().toISOString(), checks: {} };

// Check MCP server connectivity try { await fetch('http://localhost:3000/health'); health.checks.mcp_server = { status: 'up' }; } catch (error) { health.checks.mcp_server = { status: 'down', error: error.message }; health.status = 'unhealthy'; }

// Check memory usage const used = process.memoryUsage(); const memoryPercent = (used.heapUsed / used.heapTotal) * 100; health.checks.memory = { status: memoryPercent < 90 ? 'healthy' : 'warning', heap_used_mb: Math.round(used.heapUsed / 1024 / 1024), heap_total_mb: Math.round(used.heapTotal / 1024 / 1024), usage_percent: Math.round(memoryPercent) };

return health; }

  1. Dashboards

// Generate HTML dashboard export function generateDashboard(metrics, logs) { return ` <!DOCTYPE html> <html> <head> <title>Agent Metrics Dashboard</title> <script src="https://cdn.jsdelivr.net/npm/chart.js">&#x3C;/script> </head> <body> <h1>πŸ“Š Agentic Workflow Metrics</h1>

<div class="metrics"> <div class="metric-card"> <h3>Success Rate</h3> <div class="value">${metrics.successRate}%</div> </div>

&#x3C;div class="metric-card">
  &#x3C;h3>Avg Duration&#x3C;/h3>
  &#x3C;div class="value">${metrics.avgDuration}ms&#x3C;/div>
&#x3C;/div>

&#x3C;div class="metric-card">
  &#x3C;h3>Error Count&#x3C;/h3>
  &#x3C;div class="value">${metrics.errorCount}&#x3C;/div>
&#x3C;/div>

</div>

<canvas id="durationChart"></canvas>

<script> const ctx = document.getElementById('durationChart'); new Chart(ctx, { type: 'line', data: { labels: ${JSON.stringify(metrics.timestamps)}, datasets: [{ label: 'Duration (ms)', data: ${JSON.stringify(metrics.durations)}, borderColor: 'rgb(75, 192, 192)', tension: 0.1 }] } }); </script> </body> </html> `; }

πŸ“š Related Skills

  • gh-aw-github-actions-integration - CI/CD integration patterns

  • gh-aw-mcp-gateway - MCP Gateway configuration

  • gh-aw-safe-outputs - Safe output handling

  • gh-aw-authentication-credentials - Authentication management

  • gh-aw-containerization - Container deployment

πŸ”— References

Logging & Monitoring

  • Winston Logger

  • OpenTelemetry

  • Prometheus

  • Grafana

GitHub Actions

  • GitHub Actions Logs

  • Job Summaries

Observability Tools

  • Datadog

  • New Relic

  • Sentry

βœ… Remember Checklist

When implementing logging and monitoring for agentic workflows:

  • Use structured JSON logging format

  • Include correlation IDs in all logs

  • Sanitize sensitive data before logging

  • Set appropriate log levels

  • Implement log aggregation

  • Collect performance metrics

  • Track LLM API usage and costs

  • Define alert rules and thresholds

  • Create actionable alerts (not noise)

  • Implement health checks

  • Use distributed tracing for complex flows

  • Generate metrics dashboards

  • Monitor success rates

  • Track error patterns

  • Implement debug mode for troubleshooting

License: Apache-2.0

Version: 1.0.0

Last Updated: 2026-02-17

Maintained by: Hack23 Organization

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

copilot-agent-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

gh-aw-workflow-authoring

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

containerization for agentic workflows

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

authentication and credentials for agentic workflows

No summary provided by upstream source.

Repository SourceNeeds Review