Error Tracking Skill

This skill helps you track and debug errors in production using CloudWatch Logs and structured logging.

When to Use This Skill

Investigating production errors
Monitoring application health
Debugging intermittent issues
Analyzing error patterns
Setting up alerting
Improving observability
Troubleshooting user-reported issues

Logging Infrastructure

CloudWatch Logs

AWS Lambda functions automatically log to CloudWatch:

CloudWatch Log Groups: ├── /aws/lambda/sgcarstrends-api-prod ├── /aws/lambda/sgcarstrends-web-prod └── /aws/lambda/sgcarstrends-workflows-prod

Structured Logging

Logger Setup

// packages/utils/src/logger.ts import pino from "pino";

export const logger = pino({ level: process.env.LOG_LEVEL || "info", formatters: { level: (label) => ({ level: label }), }, timestamp: pino.stdTimeFunctions.isoTime, base: { env: process.env.NODE_ENV, service: process.env.SERVICE_NAME, }, });

// Export typed logger methods export const log = { info: (message: string, data?: Record<string, unknown>) => { logger.info(data, message); }, error: (message: string, error: Error, data?: Record<string, unknown>) => { logger.error( { ...data, error: { message: error.message, stack: error.stack, name: error.name, }, }, message ); }, warn: (message: string, data?: Record<string, unknown>) => { logger.warn(data, message); }, debug: (message: string, data?: Record<string, unknown>) => { logger.debug(data, message); }, };

Usage in Code

// apps/api/src/routes/cars.ts import { log } from "@sgcarstrends/utils/logger";

export const getCars = async (c: Context) => { try { log.info("Fetching cars", { month: c.req.query("month"), userId: c.get("userId"), });

const cars = await db.query.cars.findMany();

log.info("Cars fetched successfully", {
  count: cars.length,
});

return c.json(cars);

} catch (error) { log.error("Failed to fetch cars", error as Error, { month: c.req.query("month"), });

return c.json({ error: "Failed to fetch cars" }, 500);

} };

Viewing Logs

AWS CLI

View recent logs

aws logs tail /aws/lambda/sgcarstrends-api-prod --follow

Filter by error level

aws logs tail /aws/lambda/sgcarstrends-api-prod
--filter-pattern "ERROR"

View logs from specific time range

aws logs filter-log-events
--log-group-name /aws/lambda/sgcarstrends-api-prod
--start-time $(($(date +%s) - 3600))000
--end-time $(date +%s)000
--filter-pattern "ERROR"

Search for specific message

aws logs filter-log-events
--log-group-name /aws/lambda/sgcarstrends-api-prod
--filter-pattern "Failed to fetch cars"

SST Console

Open SST console

cd apps/api sst dev

View logs in browser

Navigate to Functions → sgcarstrends-api-prod → Logs

Error Patterns

Common Error Logging

// Database errors try { const result = await db.query.cars.findMany(); } catch (error) { log.error("Database query failed", error as Error, { query: "cars.findMany", retryable: true, }); throw error; }

// External API errors try { const response = await fetch(url); if (!response.ok) { log.error("External API error", new Error("API request failed"), { url, status: response.status, statusText: response.statusText, }); } } catch (error) { log.error("External API request failed", error as Error, { url, }); }

// Validation errors const result = schema.safeParse(data); if (!result.success) { log.warn("Validation failed", { errors: result.error.issues, data, }); return c.json({ error: "Invalid request" }, 400); }

// Authentication errors if (!user) { log.warn("Unauthorized access attempt", { path: c.req.path, ip: c.req.header("x-forwarded-for"), }); return c.json({ error: "Unauthorized" }, 401); }

CloudWatch Insights

Query Logs

-- Find all errors in last hour fields @timestamp, @message, level, error.message | filter level = "error" | sort @timestamp desc | limit 100

-- Count errors by type fields error.name | filter level = "error" | stats count() by error.name | sort count() desc

-- Find slow requests fields @timestamp, @message, duration | filter level = "info" and @message like /Request completed/ | filter duration > 1000 | sort duration desc

-- Track error rate over time fields @timestamp | filter level = "error" | stats count() as ErrorCount by bin(5m)

-- Find errors for specific user fields @timestamp, @message, userId, error.message | filter level = "error" and userId = "user123" | sort @timestamp desc

Common Queries

-- Database connection errors fields @timestamp, @message, error.message | filter error.message like /connection/ | sort @timestamp desc

-- Memory errors fields @timestamp, @message, error.message | filter error.message like /memory/ or error.message like /heap/ | sort @timestamp desc

-- Timeout errors fields @timestamp, @message, error.message | filter error.message like /timeout/ or error.message like /timed out/ | sort @timestamp desc

-- Rate limit errors fields @timestamp, @message, error.message | filter error.message like /rate limit/ or error.message like /too many requests/ | sort @timestamp desc

Error Monitoring

CloudWatch Alarms

// infra/monitoring.ts import { Alarm } from "sst/constructs";

export function Monitoring({ stack }: StackContext) { // Error rate alarm new Alarm(stack, "HighErrorRate", { sns: { topicArn: process.env.SNS_TOPIC_ARN, }, alarm: (props) => ({ alarmName: "sgcarstrends-high-error-rate", evaluationPeriods: 2, threshold: 10, comparisonOperator: "GreaterThanThreshold", metric: new Metric({ namespace: "AWS/Lambda", metricName: "Errors", dimensions: { FunctionName: props.functionName, }, statistic: "Sum", period: Duration.minutes(5), }), }), });

// High latency alarm new Alarm(stack, "HighLatency", { sns: { topicArn: process.env.SNS_TOPIC_ARN, }, alarm: (props) => ({ alarmName: "sgcarstrends-high-latency", evaluationPeriods: 3, threshold: 1000, // 1 second comparisonOperator: "GreaterThanThreshold", metric: new Metric({ namespace: "AWS/Lambda", metricName: "Duration", dimensions: { FunctionName: props.functionName, }, statistic: "Average", period: Duration.minutes(5), }), }), }); }

Error Aggregation

Group Similar Errors

// packages/utils/src/error-tracker.ts interface ErrorGroup { fingerprint: string; message: string; count: number; lastSeen: Date; firstSeen: Date; }

export class ErrorTracker { private errors: Map<string, ErrorGroup> = new Map();

track(error: Error, context?: Record<string, unknown>) { const fingerprint = this.getFingerprint(error);

const existing = this.errors.get(fingerprint);
if (existing) {
  existing.count++;
  existing.lastSeen = new Date();
} else {
  this.errors.set(fingerprint, {
    fingerprint,
    message: error.message,
    count: 1,
    lastSeen: new Date(),
    firstSeen: new Date(),
  });
}

// Log error
log.error("Error tracked", error, {
  ...context,
  fingerprint,
  count: this.errors.get(fingerprint)?.count,
});

}

private getFingerprint(error: Error): string { // Create fingerprint from error type and message const parts = [ error.name, error.message.replace(/\d+/g, "N"), // Replace numbers error.stack?.split("\n")[1], // First stack frame ]; return parts.filter(Boolean).join("|"); }

getTopErrors(limit = 10): ErrorGroup[] { return Array.from(this.errors.values()) .sort((a, b) => b.count - a.count) .slice(0, limit); } }

Best Practices

Log Context

// ❌ No context log.error("Error occurred", error);

// ✅ With context log.error("Failed to process payment", error, { userId: user.id, amount: payment.amount, currency: payment.currency, paymentId: payment.id, });

Use Structured Logs

// ❌ String concatenation console.log(User ${userId} performed action ${action});

// ✅ Structured logging log.info("User action", { userId, action, timestamp: new Date().toISOString(), });

Don't Log Sensitive Data

// ❌ Logging sensitive data log.info("User logged in", { email: user.email, password: user.password, // NEVER log passwords! creditCard: user.creditCard, });

// ✅ Safe logging log.info("User logged in", { userId: user.id, email: user.email.replace(/(?<=.{2}).(?=.@)/g, ""), // Mask email });

Set Appropriate Log Levels

// Production log.debug("Database query", { query }); // Not logged in prod log.info("Request completed", { duration }); // Logged log.warn("Cache miss", { key }); // Logged log.error("Database error", error); // Logged

// Development // All levels logged

Debugging Production Issues

Step-by-Step Process

1. Identify the issue

Check CloudWatch Logs for errors

aws logs tail /aws/lambda/sgcarstrends-api-prod --filter-pattern "ERROR"

2. Find error pattern

Search for similar errors

aws logs filter-log-events
--log-group-name /aws/lambda/sgcarstrends-api-prod
--filter-pattern "Failed to fetch cars"

3. Check error context

View logs with context

aws logs get-log-events
--log-group-name /aws/lambda/sgcarstrends-api-prod
--log-stream-name 2024/01/15/[$LATEST]abc123
--start-from-head

4. Analyze error frequency

Use CloudWatch Insights

Query: Count errors by type

5. Reproduce locally

Use error context to reproduce

6. Fix and deploy

Create fix, test, deploy

7. Verify fix

Monitor logs after deployment

aws logs tail /aws/lambda/sgcarstrends-api-prod --follow

Troubleshooting

Logs Not Appearing

Issue: Logs not showing in CloudWatch

Solution: Check Lambda execution role permissions

Ensure Lambda has CloudWatch Logs permissions:

- logs:CreateLogGroup

- logs:CreateLogStream

- logs:PutLogEvents

Too Many Logs

Issue: Too much logging causing high costs

Solution: Adjust log level and retention

Set log level in production

LOG_LEVEL=info

Reduce retention period

aws logs put-retention-policy
--log-group-name /aws/lambda/sgcarstrends-api-prod
--retention-in-days 7

Cannot Find Specific Error

Issue: Can't find error in logs

Solution: Improve search with CloudWatch Insights

Use more specific filters

fields @timestamp, @message | filter @message like /specific pattern/ | sort @timestamp desc

References

AWS CloudWatch Logs: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/
CloudWatch Insights: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html
Pino Logger: https://getpino.io
Related files:
packages/utils/src/logger.ts
Logger configuration
Root CLAUDE.md - Logging guidelines

Best Practices Summary

Structured Logging: Use structured logs with context
Appropriate Levels: Use correct log levels (debug, info, warn, error)
Don't Log Secrets: Never log sensitive data
Add Context: Include relevant context for debugging
Monitor Errors: Set up CloudWatch Alarms
Aggregate Errors: Group similar errors together
Log Retention: Set appropriate retention periods
Use Insights: Leverage CloudWatch Insights for analysis