📊 Logging and Monitoring for Agentic Workflows

📋 Overview

This skill provides comprehensive patterns for implementing observability in GitHub Agentic Workflows. It covers structured logging architectures, metrics collection, alerting strategies, debugging techniques, and production monitoring best practices for autonomous agent systems.

🎯 Core Concepts

Observability Architecture

graph TB subgraph "Agent Execution" A[Agent Start] --> B[Task Execution] B --> C[MCP Operations] C --> D[Agent Completion] end

subgraph "Logging Layer"
    B --> E[Structured Logs]
    C --> E
    D --> E
    E --> F[Log Aggregation]
end

subgraph "Metrics Layer"
    B --> G[Performance Metrics]
    C --> G
    D --> G
    G --> H[Time Series DB]
end

subgraph "Alerting Layer"
    F --> I[Alert Rules]
    H --> I
    I --> J[Notifications]
end

subgraph "Visualization"
    F --> K[Log Explorer]
    H --> L[Dashboards]
    K --> M[Insights]
    L --> M
end

style E fill:#00d9ff
style G fill:#ff006e
style I fill:#ffbe0b

Three Pillars of Observability

Logs: Detailed event records with context
Metrics: Quantitative measurements over time
Traces: Request flow through system components

📝 Structured Logging

Logging Architecture

JSON Structured Logging Format

// scripts/agents/lib/logger.js import winston from 'winston'; import { v4 as uuidv4 } from 'uuid';

/**

Structured logger for agentic workflows
Outputs JSON logs with consistent schema */ class AgentLogger { constructor(options = {}) { this.agentId = options.agentId || process.env.AGENT_ID || uuidv4(); this.sessionId = options.sessionId || uuidv4(); this.environment = process.env.NODE_ENV || 'development';

this.logger = winston.createLogger({ level: options.level || process.env.LOG_LEVEL || 'info', format: winston.format.combine( winston.format.timestamp({ format: 'ISO' }), winston.format.errors({ stack: true }), winston.format.json() ), defaultMeta: { agent_id: this.agentId, session_id: this.sessionId, environment: this.environment, service: 'agentic-workflow', version: process.env.APP_VERSION || '1.0.0' }, transports: [ // Console output (GitHub Actions) new winston.transports.Console({ format: winston.format.combine( winston.format.colorize(), winston.format.printf(this.formatConsoleOutput.bind(this)) ) }), // File output (local development) new winston.transports.File({ filename: 'logs/agent-errors.log', level: 'error', maxsize: 10485760, // 10MB maxFiles: 5, tailable: true }), new winston.transports.File({ filename: 'logs/agent-combined.log', maxsize: 10485760, maxFiles: 10, tailable: true }) ], exitOnError: false }); }

/**

Format console output for human readability */ formatConsoleOutput(info) { const { timestamp, level, message, ...meta } = info; const metaStr = Object.keys(meta).length ? JSON.stringify(meta, null, 2) : ''; return ${timestamp} [${level}] ${message}${metaStr ? '\n' + metaStr : ''}; }

/**

Log agent lifecycle events */ logAgentStart(taskType, context = {}) { this.logger.info('Agent execution started', { event_type: 'agent_start', task_type: taskType, ...context }); }

logAgentComplete(taskType, result, metrics = {}) { this.logger.info('Agent execution completed', { event_type: 'agent_complete', task_type: taskType, status: result.status, duration_ms: metrics.duration, tokens_used: metrics.tokens, ...result }); }

logAgentError(taskType, error, context = {}) { this.logger.error('Agent execution failed', { event_type: 'agent_error', task_type: taskType, error_message: error.message, error_stack: error.stack, error_code: error.code, ...context }); }

/**

Log MCP operations */ logMCPCall(toolName, params, context = {}) { this.logger.debug('MCP tool call', { event_type: 'mcp_call', tool_name: toolName, params: this.sanitizeParams(params), ...context }); }

logMCPResponse(toolName, response, duration) { this.logger.debug('MCP tool response', { event_type: 'mcp_response', tool_name: toolName, duration_ms: duration, response_size: JSON.stringify(response).length, status: 'success' }); }

logMCPError(toolName, error, duration) { this.logger.error('MCP tool error', { event_type: 'mcp_error', tool_name: toolName, duration_ms: duration, error_message: error.message, error_code: error.code }); }

/**

Log performance metrics */ logPerformance(operation, metrics) { this.logger.info('Performance metrics', { event_type: 'performance', operation, duration_ms: metrics.duration, memory_mb: metrics.memory, cpu_percent: metrics.cpu }); }

/**

Log security events */ logSecurityEvent(eventType, details) { this.logger.warn('Security event', { event_type: 'security', security_event: eventType, ...details }); }

/**

Sanitize sensitive data from logs */ sanitizeParams(params) { const sensitive = ['token', 'password', 'secret', 'key', 'api_key']; const sanitized = { ...params };

for (const key of Object.keys(sanitized)) {

  if (sensitive.some(s => key.toLowerCase().includes(s))) {
    sanitized[key] = '***REDACTED***';
  }
}

return sanitized;

}

/**

Create child logger with additional context */ child(metadata) { const childLogger = Object.create(this); childLogger.logger = this.logger.child(metadata); return childLogger; } }

export default AgentLogger;

// Usage example const logger = new AgentLogger({ agentId: 'pr-analyzer', sessionId: process.env.GITHUB_RUN_ID });

logger.logAgentStart('pr-analysis', { pr_number: 123, repository: 'owner/repo' });

Python Structured Logging

scripts/agents/lib/logger.py

import logging import json import sys import os from datetime import datetime from typing import Any, Dict, Optional import uuid

class JSONFormatter(logging.Formatter): """ JSON formatter for structured logging """ def init(self): super().init() self.agent_id = os.getenv('AGENT_ID', str(uuid.uuid4())) self.session_id = os.getenv('GITHUB_RUN_ID', str(uuid.uuid4())) self.environment = os.getenv('ENVIRONMENT', 'development')

def format(self, record: logging.LogRecord) -> str:
    """Format log record as JSON"""
    log_data = {
        'timestamp': datetime.utcnow().isoformat() + 'Z',
        'level': record.levelname,
        'message': record.getMessage(),
        'agent_id': self.agent_id,
        'session_id': self.session_id,
        'environment': self.environment,
        'service': 'agentic-workflow',
        'logger': record.name,
        'module': record.module,
        'function': record.funcName,
        'line': record.lineno
    }
    
    # Add exception info if present
    if record.exc_info:
        log_data['exception'] = {
            'type': record.exc_info[0].__name__,
            'message': str(record.exc_info[1]),
            'traceback': self.formatException(record.exc_info)
        }
        
    # Add extra fields
    if hasattr(record, 'extra_fields'):
        log_data.update(record.extra_fields)
        
    return json.dumps(log_data)

class AgentLogger: """ Structured logger for agentic workflows """ def init(self, name: str, level: str = 'INFO'): self.logger = logging.getLogger(name) self.logger.setLevel(getattr(logging, level.upper()))

    # Console handler with JSON format
    console_handler = logging.StreamHandler(sys.stdout)
    console_handler.setFormatter(JSONFormatter())
    self.logger.addHandler(console_handler)
    
    # File handler for errors
    if not os.path.exists('logs'):
        os.makedirs('logs')
        
    error_handler = logging.FileHandler('logs/agent-errors.log')
    error_handler.setLevel(logging.ERROR)
    error_handler.setFormatter(JSONFormatter())
    self.logger.addHandler(error_handler)
    
def log_agent_start(self, task_type: str, context: Dict[str, Any] = None):
    """Log agent start event"""
    extra = {
        'extra_fields': {
            'event_type': 'agent_start',
            'task_type': task_type,
            **(context or {})
        }
    }
    self.logger.info('Agent execution started', extra=extra)
    
def log_agent_complete(self, task_type: str, result: Dict[str, Any], metrics: Dict[str, Any] = None):
    """Log agent completion event"""
    extra = {
        'extra_fields': {
            'event_type': 'agent_complete',
            'task_type': task_type,
            'status': result.get('status'),
            'duration_ms': metrics.get('duration') if metrics else None,
            **result
        }
    }
    self.logger.info('Agent execution completed', extra=extra)
    
def log_agent_error(self, task_type: str, error: Exception, context: Dict[str, Any] = None):
    """Log agent error event"""
    extra = {
        'extra_fields': {
            'event_type': 'agent_error',
            'task_type': task_type,
            'error_type': type(error).__name__,
            'error_message': str(error),
            **(context or {})
        }
    }
    self.logger.error('Agent execution failed', extra=extra, exc_info=error)
    
def log_mcp_call(self, tool_name: str, params: Dict[str, Any], context: Dict[str, Any] = None):
    """Log MCP tool call"""
    extra = {
        'extra_fields': {
            'event_type': 'mcp_call',
            'tool_name': tool_name,
            'params': self._sanitize_params(params),
            **(context or {})
        }
    }
    self.logger.debug('MCP tool call', extra=extra)
    
def _sanitize_params(self, params: Dict[str, Any]) -> Dict[str, Any]:
    """Sanitize sensitive data"""
    sensitive = ['token', 'password', 'secret', 'key', 'api_key']
    sanitized = params.copy()
    
    for key in sanitized.keys():
        if any(s in key.lower() for s in sensitive):
            sanitized[key] = '***REDACTED***'
            
    return sanitized

Usage

logger = AgentLogger('pr-analyzer', level='INFO') logger.log_agent_start('pr-analysis', {'pr_number': 123})

Log Levels and When to Use Them

Log Level Guidelines

// ERROR: System errors requiring immediate attention logger.error('Failed to connect to MCP server', { error: error.message, mcp_server: 'github', retry_count: 3 });

// WARN: Unexpected conditions that don't prevent operation logger.warn('MCP response slow', { duration_ms: 5000, threshold_ms: 3000, tool_name: 'github-search' });

// INFO: Important business events and milestones logger.info('PR analysis completed', { pr_number: 123, issues_found: 5, duration_ms: 15000 });

// DEBUG: Detailed information for debugging logger.debug('Processing file', { file_path: 'src/index.js', file_size: 1024, line_count: 50 });

// TRACE: Very detailed diagnostic information (if supported) logger.trace('Token consumed', { token_count: 1000, model: 'claude-3-5-sonnet', prompt_type: 'analysis' });

Log Aggregation Patterns

GitHub Actions Log Groups

Group related logs in GitHub Actions

steps:

name: Run Agent Analysis run: | echo "::group::Agent Initialization" node scripts/agents/pr-analyzer.js --phase=init echo "::endgroup::"

echo "::group::MCP Server Connection" node scripts/agents/pr-analyzer.js --phase=connect echo "::endgroup::"

echo "::group::Analysis Execution" node scripts/agents/pr-analyzer.js --phase=analyze echo "::endgroup::"

echo "::group::Report Generation" node scripts/agents/pr-analyzer.js --phase=report echo "::endgroup::"

Centralized Log Collection

// scripts/agents/lib/log-collector.js import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3'; import { gunzipSync, gzipSync } from 'zlib';

/**

Collect and upload logs to centralized storage */ class LogCollector { constructor(options = {}) { this.s3Client = new S3Client({ region: options.region || 'us-east-1' }); this.bucket = options.bucket || 'agentic-workflow-logs'; this.compression = options.compression !== false; }

/**

Upload logs to S3 */ async uploadLogs(logFile, metadata = {}) { const content = await fs.readFile(logFile, 'utf8'); const logs = content.split('\n').filter(Boolean).map(JSON.parse);

const key = this.generateLogKey(metadata);

const body = this.compression ? gzipSync(JSON.stringify(logs)) : JSON.stringify(logs);

await this.s3Client.send(new PutObjectCommand({
  Bucket: this.bucket,
  Key: key,
  Body: body,
  ContentType: 'application/json',
  ContentEncoding: this.compression ? 'gzip' : undefined,
  Metadata: {
    agent_id: metadata.agentId,
    session_id: metadata.sessionId,
    timestamp: new Date().toISOString()
  }
}));

console.log(`✅ Logs uploaded to s3://${this.bucket}/${key}`);

}

/**

Generate S3 key with partitioning */ generateLogKey(metadata) { const date = new Date(); const year = date.getUTCFullYear(); const month = String(date.getUTCMonth() + 1).padStart(2, '0'); const day = String(date.getUTCDate()).padStart(2, '0');

return [

  'logs',
  `year=${year}`,
  `month=${month}`,
  `day=${day}`,
  `agent=${metadata.agentId}`,
  `${metadata.sessionId}.json${this.compression ? '.gz' : ''}`
].join('/');

}

/**

Query logs from S3 */ async queryLogs(filters = {}) { // Use S3 Select for efficient querying const params = { Bucket: this.bucket, Key: filters.key, ExpressionType: 'SQL', Expression: this.buildSQLQuery(filters), InputSerialization: { JSON: { Type: 'LINES' }, CompressionType: this.compression ? 'GZIP' : 'NONE' }, OutputSerialization: { JSON: { RecordDelimiter: '\n' } } };

// Implementation details...

} }

export default LogCollector;

📈 Metrics Collection

Performance Metrics

Agent Execution Metrics

// scripts/agents/lib/metrics.js import { performance } from 'perf_hooks'; import os from 'os';

/**

Metrics collector for agentic workflows */ class MetricsCollector { constructor() { this.metrics = new Map(); this.startTime = performance.now(); }

/**

Record execution time */ recordDuration(operation, duration) { this.recordMetric('duration_ms', operation, duration, 'histogram'); }

/**

Record counter metric */ recordCounter(name, value = 1, labels = {}) { this.recordMetric(name, JSON.stringify(labels), value, 'counter'); }

/**

Record gauge metric */ recordGauge(name, value, labels = {}) { this.recordMetric(name, JSON.stringify(labels), value, 'gauge'); }

/**

Store metric value */ recordMetric(name, key, value, type) { const metricKey = ${name}:${key};

if (!this.metrics.has(metricKey)) {

  this.metrics.set(metricKey, {
    name,
    type,
    values: [],
    labels: key !== 'default' ? JSON.parse(key) : {}
  });
}

this.metrics.get(metricKey).values.push({
  value,
  timestamp: Date.now()
});

}

/**

Calculate system metrics */ getSystemMetrics() { const used = process.memoryUsage(); const cpus = os.cpus();

return {

  memory: {
    heap_used_mb: Math.round(used.heapUsed / 1024 / 1024),
    heap_total_mb: Math.round(used.heapTotal / 1024 / 1024),
    rss_mb: Math.round(used.rss / 1024 / 1024),
    external_mb: Math.round(used.external / 1024 / 1024)
  },
  cpu: {
    count: cpus.length,
    model: cpus[0].model,
    load_avg: os.loadavg()
  },
  uptime_seconds: Math.round(process.uptime())
};

}

/**

Calculate agent-specific metrics */ getAgentMetrics() { const duration = performance.now() - this.startTime;

return {

  total_duration_ms: Math.round(duration),
  operations_count: this.getOperationCount(),
  errors_count: this.getErrorCount(),
  success_rate: this.calculateSuccessRate()
};

}

/**

Export metrics in Prometheus format */ exportPrometheus() { const lines = [];

for (const [key, metric] of this.metrics.entries()) {

  const { name, type, values, labels } = metric;
  
  // Metric help and type
  lines.push(`# HELP ${name} Agent workflow metric`);
  lines.push(`# TYPE ${name} ${type}`);
  
  // Metric values
  const labelStr = Object.entries(labels)
    .map(([k, v]) => `${k}="${v}"`)
    .join(',');
    
  if (type === 'histogram') {
    const sorted = values.map(v => v.value).sort((a, b) => a - b);
    lines.push(`${name}_sum{${labelStr}} ${sorted.reduce((a, b) => a + b, 0)}`);
    lines.push(`${name}_count{${labelStr}} ${sorted.length}`);
    lines.push(`${name}_bucket{${labelStr},le="100"} ${sorted.filter(v => v &#x3C;= 100).length}`);
    lines.push(`${name}_bucket{${labelStr},le="500"} ${sorted.filter(v => v &#x3C;= 500).length}`);
    lines.push(`${name}_bucket{${labelStr},le="1000"} ${sorted.filter(v => v &#x3C;= 1000).length}`);
    lines.push(`${name}_bucket{${labelStr},le="+Inf"} ${sorted.length}`);
  } else if (type === 'counter') {
    const sum = values.reduce((acc, v) => acc + v.value, 0);
    lines.push(`${name}{${labelStr}} ${sum}`);
  } else if (type === 'gauge') {
    const latest = values[values.length - 1];
    lines.push(`${name}{${labelStr}} ${latest.value}`);
  }
}

return lines.join('\n');

}

/**

Export metrics as JSON */ exportJSON() { const metrics = { timestamp: new Date().toISOString(), agent: this.getAgentMetrics(), system: this.getSystemMetrics(), custom: {} };

for (const [key, metric] of this.metrics.entries()) {

  metrics.custom[metric.name] = {
    type: metric.type,
    labels: metric.labels,
    values: metric.values
  };
}

return metrics;

} }

export default MetricsCollector;

// Usage const metrics = new MetricsCollector();

// Record operation duration const start = performance.now(); await performOperation(); metrics.recordDuration('pr_analysis', performance.now() - start);

// Record counter metrics.recordCounter('files_analyzed', 1, { language: 'javascript' });

// Record gauge metrics.recordGauge('active_mcp_connections', 3);

// Export metrics console.log(metrics.exportPrometheus());

Custom Metrics for Agents

// Agent-specific metrics class AgentMetrics extends MetricsCollector { /**

Track LLM API usage */ recordLLMCall(model, tokens, duration) { this.recordCounter('llm_calls_total', 1, { model }); this.recordCounter('llm_tokens_total', tokens, { model }); this.recordDuration('llm_duration', duration); }

/**

Track MCP tool usage */ recordMCPTool(toolName, duration, success) { this.recordCounter('mcp_calls_total', 1, { tool: toolName, success: success.toString() }); this.recordDuration('mcp_duration', duration); }

/**

Track agent decisions */ recordDecision(decisionType, confidence) { this.recordCounter('agent_decisions_total', 1, { type: decisionType }); this.recordGauge('agent_confidence', confidence, { type: decisionType }); }

/**

Track code changes */ recordCodeChanges(filesModified, linesAdded, linesRemoved) { this.recordCounter('files_modified_total', filesModified); this.recordCounter('lines_added_total', linesAdded); this.recordCounter('lines_removed_total', linesRemoved); }

/**

Calculate cost metrics */ calculateCost(model, tokens) { const costs = { 'claude-3-5-sonnet': 0.003, // per 1k tokens 'gpt-4-turbo': 0.01, 'gpt-3.5-turbo': 0.0015 };

const costPerToken = costs[model] || 0;

const cost = (tokens / 1000) * costPerToken;

this.recordGauge('agent_cost_usd', cost, { model });

return cost;

} }

Metrics Visualization

GitHub Actions Job Summary

// Generate metrics summary for GitHub Actions async function generateMetricsSummary(metrics) { const agentMetrics = metrics.getAgentMetrics(); const systemMetrics = metrics.getSystemMetrics();

await core.summary .addHeading('📊 Agent Execution Metrics') .addTable([ [{data: 'Metric', header: true}, {data: 'Value', header: true}], ['Total Duration', ${agentMetrics.total_duration_ms}ms], ['Operations', agentMetrics.operations_count.toString()], ['Errors', agentMetrics.errors_count.toString()], ['Success Rate', ${agentMetrics.success_rate}%] ]) .addHeading('💻 System Metrics', 3) .addTable([ [{data: 'Resource', header: true}, {data: 'Usage', header: true}], ['Memory (Heap)', ${systemMetrics.memory.heap_used_mb}MB / ${systemMetrics.memory.heap_total_mb}MB], ['Memory (RSS)', ${systemMetrics.memory.rss_mb}MB], ['CPU Count', systemMetrics.cpu.count.toString()], ['Uptime', ${systemMetrics.uptime_seconds}s] ]) .write(); }

🚨 Alerting Strategies

Alert Rules Configuration

.github/workflows/alert-rules.yml

Monitor agent execution and trigger alerts

name: Agent Monitoring & Alerts

on: schedule: - cron: '*/15 * * * *' # Every 15 minutes workflow_dispatch:

permissions: contents: read issues: write

jobs: check-metrics: name: Check Agent Metrics runs-on: ubuntu-latest

steps:
  - name: Fetch Recent Workflow Runs
    id: fetch-runs
    uses: actions/github-script@v7
    with:
      github-token: ${{ secrets.GITHUB_TOKEN }}
      script: |
        const { data: runs } = await github.rest.actions.listWorkflowRunsForRepo({
          owner: context.repo.owner,
          repo: context.repo.repo,
          status: 'completed',
          per_page: 50
        });
        
        // Filter agent workflows
        const agentRuns = runs.workflow_runs.filter(run => 
          run.name.includes('Agent') || run.name.includes('Agentic')
        );
        
        return agentRuns;
        
  - name: Calculate Metrics
    id: metrics
    uses: actions/github-script@v7
    with:
      github-token: ${{ secrets.GITHUB_TOKEN }}
      script: |
        const runs = ${{ steps.fetch-runs.outputs.result }};
        
        const total = runs.length;
        const failed = runs.filter(r => r.conclusion === 'failure').length;
        const successRate = total > 0 ? ((total - failed) / total) * 100 : 100;
        
        const durations = runs.map(r => {
          const start = new Date(r.created_at);
          const end = new Date(r.updated_at);
          return (end - start) / 1000;
        });
        
        const avgDuration = durations.reduce((a, b) => a + b, 0) / durations.length;
        const p95Duration = durations.sort((a, b) => a - b)[Math.floor(durations.length * 0.95)];
        
        return {
          total,
          failed,
          successRate,
          avgDuration,
          p95Duration
        };
        
  - name: Check Alert Conditions
    id: check-alerts
    uses: actions/github-script@v7
    with:
      github-token: ${{ secrets.GITHUB_TOKEN }}
      script: |
        const metrics = ${{ steps.metrics.outputs.result }};
        const alerts = [];
        
        // Alert if success rate below 90%
        if (metrics.successRate &#x3C; 90) {
          alerts.push({
            severity: 'high',
            title: 'Low Agent Success Rate',
            message: `Success rate: ${metrics.successRate.toFixed(1)}% (threshold: 90%)`,
            metric: 'success_rate',
            value: metrics.successRate
          });
        }
        
        // Alert if p95 duration above 10 minutes
        if (metrics.p95Duration > 600) {
          alerts.push({
            severity: 'medium',
            title: 'High Agent Latency',
            message: `P95 duration: ${Math.round(metrics.p95Duration)}s (threshold: 600s)`,
            metric: 'p95_duration',
            value: metrics.p95Duration
          });
        }
        
        // Alert if too many failures
        if (metrics.failed > 5) {
          alerts.push({
            severity: 'high',
            title: 'Multiple Agent Failures',
            message: `${metrics.failed} failures in last 50 runs`,
            metric: 'failure_count',
            value: metrics.failed
          });
        }
        
        return alerts;
        
  - name: Create Alert Issues
    if: steps.check-alerts.outputs.result != '[]'
    uses: actions/github-script@v7
    with:
      github-token: ${{ secrets.GITHUB_TOKEN }}
      script: |
        const alerts = ${{ steps.check-alerts.outputs.result }};
        
        for (const alert of alerts) {
          // Check if alert already exists
          const { data: existingIssues } = await github.rest.issues.listForRepo({
            owner: context.repo.owner,
            repo: context.repo.repo,
            state: 'open',
            labels: 'alert,monitoring',
            per_page: 100
          });
          
          const exists = existingIssues.some(issue => 
            issue.title === `🚨 ${alert.title}`
          );
          
          if (!exists) {
            await github.rest.issues.create({
              owner: context.repo.owner,
              repo: context.repo.repo,
              title: `🚨 ${alert.title}`,
              body: `## Alert Details

Severity: ${alert.severity.toUpperCase()} Metric: ${alert.metric} Value: ${alert.value}

${alert.message}

Auto-generated alert from agent monitoring Timestamp: ${new Date().toISOString()}, labels: ['alert', 'monitoring', severity:${alert.severity}`] }); } }

  - name: Send Slack Notification
    if: steps.check-alerts.outputs.result != '[]'
    env:
      SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
    run: |
      alerts='${{ steps.check-alerts.outputs.result }}'
      
      curl -X POST "$SLACK_WEBHOOK_URL" \
        -H 'Content-Type: application/json' \
        -d "{
          \"text\": \"🚨 Agent Monitoring Alert\",
          \"blocks\": [
            {
              \"type\": \"header\",
              \"text\": {
                \"type\": \"plain_text\",
                \"text\": \"🚨 Agent Monitoring Alert\"
              }
            },
            {
              \"type\": \"section\",
              \"text\": {
                \"type\": \"mrkdwn\",
                \"text\": \"$alerts\"
              }
            }
          ]
        }"

2. Real-Time Alerting

// scripts/agents/lib/alerting.js import { WebClient } from '@slack/web-api'; import nodemailer from 'nodemailer';

/**

Alert manager for agentic workflows */ class AlertManager { constructor(options = {}) { this.slackClient = options.slackWebhook ? new WebClient(options.slackToken) : null; this.slackChannel = options.slackChannel; this.emailTransporter = options.emailConfig ? nodemailer.createTransporter(options.emailConfig) : null; this.thresholds = options.thresholds || this.getDefaultThresholds(); }

getDefaultThresholds() { return { duration: { warning: 300000, // 5 minutes critical: 600000 // 10 minutes }, errorRate: { warning: 0.05, // 5% critical: 0.10 // 10% }, memory: { warning: 512, // 512MB critical: 1024 // 1GB } }; }

/**

Check if metric exceeds threshold */ checkThreshold(metric, value) { const threshold = this.thresholds[metric]; if (!threshold) return null;

if (value >= threshold.critical) {

  return { level: 'critical', threshold: threshold.critical };
} else if (value >= threshold.warning) {
  return { level: 'warning', threshold: threshold.warning };
}

return null;

}

/**

Send alert to configured channels */ async sendAlert(alert) { const promises = [];

if (this.slackClient) {

  promises.push(this.sendSlackAlert(alert));
}

if (this.emailTransporter) {
  promises.push(this.sendEmailAlert(alert));
}

await Promise.allSettled(promises);

}

/**

Send Slack alert */ async sendSlackAlert(alert) { const emoji = alert.level === 'critical' ? '🔴' : '⚠️'; const color = alert.level === 'critical' ? 'danger' : 'warning';

await this.slackClient.chat.postMessage({

  channel: this.slackChannel,
  text: `${emoji} ${alert.title}`,
  blocks: [
    {
      type: 'header',
      text: {
        type: 'plain_text',
        text: `${emoji} ${alert.title}`
      }
    },
    {
      type: 'section',
      fields: [
        {
          type: 'mrkdwn',
          text: `*Severity:*\n${alert.level.toUpperCase()}`
        },
        {
          type: 'mrkdwn',
          text: `*Metric:*\n${alert.metric}`
        },
        {
          type: 'mrkdwn',
          text: `*Value:*\n${alert.value}`
        },
        {
          type: 'mrkdwn',
          text: `*Threshold:*\n${alert.threshold}`
        }
      ]
    },
    {
      type: 'section',
      text: {
        type: 'mrkdwn',
        text: alert.message
      }
    },
    {
      type: 'context',
      elements: [
        {
          type: 'mrkdwn',
          text: `Agent: ${alert.agentId} | Session: ${alert.sessionId}`
        }
      ]
    }
  ],
  attachments: [
    {
      color: color,
      text: `View logs: ${alert.logsUrl}`
    }
  ]
});

}

/**

Send email alert */ async sendEmailAlert(alert) { await this.emailTransporter.sendMail({ from: 'alerts@example.com', to: 'team@example.com', subject: [${alert.level.toUpperCase()}] ${alert.title}, html: <h2>${alert.level === 'critical' ? '🔴' : '⚠️'} ${alert.title}</h2> <table> <tr><th>Severity</th><td>${alert.level.toUpperCase()}</td></tr> <tr><th>Metric</th><td>${alert.metric}</td></tr> <tr><th>Value</th><td>${alert.value}</td></tr> <tr><th>Threshold</th><td>${alert.threshold}</td></tr> </table> <p>${alert.message}</p> <p><a href="${alert.logsUrl}">View Logs</a></p> }); } }

export default AlertManager;

🐛 Debugging Techniques

Debug Mode

// Enable debug mode for detailed logging process.env.DEBUG = 'agent:,mcp:'; process.env.LOG_LEVEL = 'debug';

// Use debug module import createDebug from 'debug';

const debug = createDebug('agent:pr-analyzer'); const debugMCP = createDebug('mcp:github');

debug('Starting PR analysis for #%d', prNumber); debugMCP('Calling tool %s with params %o', toolName, params);

Interactive Debugging

Enable tmate for interactive debugging

steps:

name: Setup tmate session if: failure() # Only on failure uses: mxschmitt/action-tmate@v3 with: limit-access-to-actor: true timeout-minutes: 30

Trace Analysis

// scripts/agents/lib/tracer.js import { trace, context, SpanStatusCode } from '@opentelemetry/api';

/**

OpenTelemetry tracer for agentic workflows */ class AgentTracer { constructor(serviceName = 'agentic-workflow') { this.tracer = trace.getTracer(serviceName); }

/**

Create span for operation */ async traceOperation(name, fn, attributes = {}) { const span = this.tracer.startSpan(name, { attributes: { 'agent.service': 'agentic-workflow', ...attributes } });

try {

  const result = await context.with(
    trace.setSpan(context.active(), span),
    fn
  );
  
  span.setStatus({ code: SpanStatusCode.OK });
  return result;
} catch (error) {
  span.setStatus({
    code: SpanStatusCode.ERROR,
    message: error.message
  });
  span.recordException(error);
  throw error;
} finally {
  span.end();
}

}

/**

Trace MCP tool call */ async traceMCPCall(toolName, params, fn) { return this.traceOperation( mcp.${toolName}, fn, { 'mcp.tool': toolName, 'mcp.params': JSON.stringify(params) } ); } }

// Usage const tracer = new AgentTracer();

await tracer.traceOperation('pr-analysis', async () => { // Operation code await tracer.traceMCPCall('github-get-pr', { prNumber }, async () => { // MCP call }); });

📊 Observability Best Practices

Correlation IDs

// Generate and propagate correlation IDs import { v4 as uuidv4 } from 'uuid';

class CorrelationContext { constructor() { this.correlationId = uuidv4(); this.parentId = null; }

createChild() { const child = new CorrelationContext(); child.parentId = this.correlationId; return child; }

getHeaders() { return { 'X-Correlation-ID': this.correlationId, 'X-Parent-ID': this.parentId }; } }

// Use in all logs and API calls const ctx = new CorrelationContext(); logger.info('Starting operation', { correlation_id: ctx.correlationId, parent_id: ctx.parentId });

Health Checks

// scripts/agents/health-check.js export async function checkHealth() { const health = { status: 'healthy', timestamp: new Date().toISOString(), checks: {} };

// Check MCP server connectivity try { await fetch('http://localhost:3000/health'); health.checks.mcp_server = { status: 'up' }; } catch (error) { health.checks.mcp_server = { status: 'down', error: error.message }; health.status = 'unhealthy'; }

// Check memory usage const used = process.memoryUsage(); const memoryPercent = (used.heapUsed / used.heapTotal) * 100; health.checks.memory = { status: memoryPercent < 90 ? 'healthy' : 'warning', heap_used_mb: Math.round(used.heapUsed / 1024 / 1024), heap_total_mb: Math.round(used.heapTotal / 1024 / 1024), usage_percent: Math.round(memoryPercent) };

return health; }

Dashboards

// Generate HTML dashboard export function generateDashboard(metrics, logs) { return ` <!DOCTYPE html> <html> <head> <title>Agent Metrics Dashboard</title> <script src="https://cdn.jsdelivr.net/npm/chart.js"></script> </head> <body> <h1>📊 Agentic Workflow Metrics</h1>

<div class="metrics"> <div class="metric-card"> <h3>Success Rate</h3> <div class="value">${metrics.successRate}%</div> </div>

&#x3C;div class="metric-card">
  &#x3C;h3>Avg Duration&#x3C;/h3>
  &#x3C;div class="value">${metrics.avgDuration}ms&#x3C;/div>
&#x3C;/div>

&#x3C;div class="metric-card">
  &#x3C;h3>Error Count&#x3C;/h3>
  &#x3C;div class="value">${metrics.errorCount}&#x3C;/div>
&#x3C;/div>

</div>

📚 Related Skills

gh-aw-github-actions-integration - CI/CD integration patterns
gh-aw-mcp-gateway - MCP Gateway configuration
gh-aw-safe-outputs - Safe output handling
gh-aw-authentication-credentials - Authentication management
gh-aw-containerization - Container deployment

🔗 References

Logging & Monitoring

Winston Logger
OpenTelemetry
Prometheus
Grafana

GitHub Actions

GitHub Actions Logs
Job Summaries

Observability Tools

Datadog
New Relic
Sentry

✅ Remember Checklist

When implementing logging and monitoring for agentic workflows:

Use structured JSON logging format
Include correlation IDs in all logs
Sanitize sensitive data before logging
Set appropriate log levels
Implement log aggregation
Collect performance metrics
Track LLM API usage and costs
Define alert rules and thresholds
Create actionable alerts (not noise)
Implement health checks
Use distributed tracing for complex flows
Generate metrics dashboards
Monitor success rates
Track error patterns
Implement debug mode for troubleshooting

License: Apache-2.0

Version: 1.0.0

Last Updated: 2026-02-17

Maintained by: Hack23 Organization

logging and monitoring for agentic workflows

Safety Notice

Copy this and send it to your AI assistant to learn

scripts/agents/lib/logger.py

Usage

Group related logs in GitHub Actions

.github/workflows/alert-rules.yml

Monitor agent execution and trigger alerts

Enable tmate for interactive debugging

Source Transparency

Related Skills

copilot-agent-patterns

gh-aw-workflow-authoring

containerization for agentic workflows

authentication and credentials for agentic workflows