Error Tracking Skill
This skill helps you track and debug errors in production using CloudWatch Logs and structured logging.
When to Use This Skill
-
Investigating production errors
-
Monitoring application health
-
Debugging intermittent issues
-
Analyzing error patterns
-
Setting up alerting
-
Improving observability
-
Troubleshooting user-reported issues
Logging Infrastructure
CloudWatch Logs
AWS Lambda functions automatically log to CloudWatch:
CloudWatch Log Groups: ├── /aws/lambda/sgcarstrends-api-prod ├── /aws/lambda/sgcarstrends-web-prod └── /aws/lambda/sgcarstrends-workflows-prod
Structured Logging
Logger Setup
// packages/utils/src/logger.ts import pino from "pino";
export const logger = pino({ level: process.env.LOG_LEVEL || "info", formatters: { level: (label) => ({ level: label }), }, timestamp: pino.stdTimeFunctions.isoTime, base: { env: process.env.NODE_ENV, service: process.env.SERVICE_NAME, }, });
// Export typed logger methods export const log = { info: (message: string, data?: Record<string, unknown>) => { logger.info(data, message); }, error: (message: string, error: Error, data?: Record<string, unknown>) => { logger.error( { ...data, error: { message: error.message, stack: error.stack, name: error.name, }, }, message ); }, warn: (message: string, data?: Record<string, unknown>) => { logger.warn(data, message); }, debug: (message: string, data?: Record<string, unknown>) => { logger.debug(data, message); }, };
Usage in Code
// apps/api/src/routes/cars.ts import { log } from "@sgcarstrends/utils/logger";
export const getCars = async (c: Context) => { try { log.info("Fetching cars", { month: c.req.query("month"), userId: c.get("userId"), });
const cars = await db.query.cars.findMany();
log.info("Cars fetched successfully", {
count: cars.length,
});
return c.json(cars);
} catch (error) { log.error("Failed to fetch cars", error as Error, { month: c.req.query("month"), });
return c.json({ error: "Failed to fetch cars" }, 500);
} };
Viewing Logs
AWS CLI
View recent logs
aws logs tail /aws/lambda/sgcarstrends-api-prod --follow
Filter by error level
aws logs tail /aws/lambda/sgcarstrends-api-prod
--filter-pattern "ERROR"
View logs from specific time range
aws logs filter-log-events
--log-group-name /aws/lambda/sgcarstrends-api-prod
--start-time $(($(date +%s) - 3600))000
--end-time $(date +%s)000
--filter-pattern "ERROR"
Search for specific message
aws logs filter-log-events
--log-group-name /aws/lambda/sgcarstrends-api-prod
--filter-pattern "Failed to fetch cars"
SST Console
Open SST console
cd apps/api sst dev
View logs in browser
Navigate to Functions → sgcarstrends-api-prod → Logs
Error Patterns
Common Error Logging
// Database errors try { const result = await db.query.cars.findMany(); } catch (error) { log.error("Database query failed", error as Error, { query: "cars.findMany", retryable: true, }); throw error; }
// External API errors try { const response = await fetch(url); if (!response.ok) { log.error("External API error", new Error("API request failed"), { url, status: response.status, statusText: response.statusText, }); } } catch (error) { log.error("External API request failed", error as Error, { url, }); }
// Validation errors const result = schema.safeParse(data); if (!result.success) { log.warn("Validation failed", { errors: result.error.issues, data, }); return c.json({ error: "Invalid request" }, 400); }
// Authentication errors if (!user) { log.warn("Unauthorized access attempt", { path: c.req.path, ip: c.req.header("x-forwarded-for"), }); return c.json({ error: "Unauthorized" }, 401); }
CloudWatch Insights
Query Logs
-- Find all errors in last hour fields @timestamp, @message, level, error.message | filter level = "error" | sort @timestamp desc | limit 100
-- Count errors by type fields error.name | filter level = "error" | stats count() by error.name | sort count() desc
-- Find slow requests fields @timestamp, @message, duration | filter level = "info" and @message like /Request completed/ | filter duration > 1000 | sort duration desc
-- Track error rate over time fields @timestamp | filter level = "error" | stats count() as ErrorCount by bin(5m)
-- Find errors for specific user fields @timestamp, @message, userId, error.message | filter level = "error" and userId = "user123" | sort @timestamp desc
Common Queries
-- Database connection errors fields @timestamp, @message, error.message | filter error.message like /connection/ | sort @timestamp desc
-- Memory errors fields @timestamp, @message, error.message | filter error.message like /memory/ or error.message like /heap/ | sort @timestamp desc
-- Timeout errors fields @timestamp, @message, error.message | filter error.message like /timeout/ or error.message like /timed out/ | sort @timestamp desc
-- Rate limit errors fields @timestamp, @message, error.message | filter error.message like /rate limit/ or error.message like /too many requests/ | sort @timestamp desc
Error Monitoring
CloudWatch Alarms
// infra/monitoring.ts import { Alarm } from "sst/constructs";
export function Monitoring({ stack }: StackContext) { // Error rate alarm new Alarm(stack, "HighErrorRate", { sns: { topicArn: process.env.SNS_TOPIC_ARN, }, alarm: (props) => ({ alarmName: "sgcarstrends-high-error-rate", evaluationPeriods: 2, threshold: 10, comparisonOperator: "GreaterThanThreshold", metric: new Metric({ namespace: "AWS/Lambda", metricName: "Errors", dimensions: { FunctionName: props.functionName, }, statistic: "Sum", period: Duration.minutes(5), }), }), });
// High latency alarm new Alarm(stack, "HighLatency", { sns: { topicArn: process.env.SNS_TOPIC_ARN, }, alarm: (props) => ({ alarmName: "sgcarstrends-high-latency", evaluationPeriods: 3, threshold: 1000, // 1 second comparisonOperator: "GreaterThanThreshold", metric: new Metric({ namespace: "AWS/Lambda", metricName: "Duration", dimensions: { FunctionName: props.functionName, }, statistic: "Average", period: Duration.minutes(5), }), }), }); }
Error Aggregation
Group Similar Errors
// packages/utils/src/error-tracker.ts interface ErrorGroup { fingerprint: string; message: string; count: number; lastSeen: Date; firstSeen: Date; }
export class ErrorTracker { private errors: Map<string, ErrorGroup> = new Map();
track(error: Error, context?: Record<string, unknown>) { const fingerprint = this.getFingerprint(error);
const existing = this.errors.get(fingerprint);
if (existing) {
existing.count++;
existing.lastSeen = new Date();
} else {
this.errors.set(fingerprint, {
fingerprint,
message: error.message,
count: 1,
lastSeen: new Date(),
firstSeen: new Date(),
});
}
// Log error
log.error("Error tracked", error, {
...context,
fingerprint,
count: this.errors.get(fingerprint)?.count,
});
}
private getFingerprint(error: Error): string { // Create fingerprint from error type and message const parts = [ error.name, error.message.replace(/\d+/g, "N"), // Replace numbers error.stack?.split("\n")[1], // First stack frame ]; return parts.filter(Boolean).join("|"); }
getTopErrors(limit = 10): ErrorGroup[] { return Array.from(this.errors.values()) .sort((a, b) => b.count - a.count) .slice(0, limit); } }
Best Practices
- Log Context
// ❌ No context log.error("Error occurred", error);
// ✅ With context log.error("Failed to process payment", error, { userId: user.id, amount: payment.amount, currency: payment.currency, paymentId: payment.id, });
- Use Structured Logs
// ❌ String concatenation
console.log(User ${userId} performed action ${action});
// ✅ Structured logging log.info("User action", { userId, action, timestamp: new Date().toISOString(), });
- Don't Log Sensitive Data
// ❌ Logging sensitive data log.info("User logged in", { email: user.email, password: user.password, // NEVER log passwords! creditCard: user.creditCard, });
// ✅ Safe logging log.info("User logged in", { userId: user.id, email: user.email.replace(/(?<=.{2}).(?=.@)/g, ""), // Mask email });
- Set Appropriate Log Levels
// Production log.debug("Database query", { query }); // Not logged in prod log.info("Request completed", { duration }); // Logged log.warn("Cache miss", { key }); // Logged log.error("Database error", error); // Logged
// Development // All levels logged
Debugging Production Issues
Step-by-Step Process
1. Identify the issue
Check CloudWatch Logs for errors
aws logs tail /aws/lambda/sgcarstrends-api-prod --filter-pattern "ERROR"
2. Find error pattern
Search for similar errors
aws logs filter-log-events
--log-group-name /aws/lambda/sgcarstrends-api-prod
--filter-pattern "Failed to fetch cars"
3. Check error context
View logs with context
aws logs get-log-events
--log-group-name /aws/lambda/sgcarstrends-api-prod
--log-stream-name 2024/01/15/[$LATEST]abc123
--start-from-head
4. Analyze error frequency
Use CloudWatch Insights
Query: Count errors by type
5. Reproduce locally
Use error context to reproduce
6. Fix and deploy
Create fix, test, deploy
7. Verify fix
Monitor logs after deployment
aws logs tail /aws/lambda/sgcarstrends-api-prod --follow
Troubleshooting
Logs Not Appearing
Issue: Logs not showing in CloudWatch
Solution: Check Lambda execution role permissions
Ensure Lambda has CloudWatch Logs permissions:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
Too Many Logs
Issue: Too much logging causing high costs
Solution: Adjust log level and retention
Set log level in production
LOG_LEVEL=info
Reduce retention period
aws logs put-retention-policy
--log-group-name /aws/lambda/sgcarstrends-api-prod
--retention-in-days 7
Cannot Find Specific Error
Issue: Can't find error in logs
Solution: Improve search with CloudWatch Insights
Use more specific filters
fields @timestamp, @message | filter @message like /specific pattern/ | sort @timestamp desc
References
-
AWS CloudWatch Logs: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/
-
CloudWatch Insights: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html
-
Pino Logger: https://getpino.io
-
Related files:
-
packages/utils/src/logger.ts
-
Logger configuration
-
Root CLAUDE.md - Logging guidelines
Best Practices Summary
-
Structured Logging: Use structured logs with context
-
Appropriate Levels: Use correct log levels (debug, info, warn, error)
-
Don't Log Secrets: Never log sensitive data
-
Add Context: Include relevant context for debugging
-
Monitor Errors: Set up CloudWatch Alarms
-
Aggregate Errors: Group similar errors together
-
Log Retention: Set appropriate retention periods
-
Use Insights: Leverage CloudWatch Insights for analysis