Datadog Observability Skill (via pup)
Requires: pup CLI, authenticated via pup auth login or DD_API_KEY
- DD_APP_KEY env vars.
Choose Your Workflow
Goal Command
Find errors in a service Search Logs
Count errors / compute metrics Aggregate Logs
Query time-series metrics Query Metrics
List APM services + perf stats APM Services
View service dependencies APM Dependencies
Search Logs
Returns log entries matching a Datadog query.
Errors in a service in the last hour
pup logs search --query="service:payment AND status:error" --from="1h"
Filter by service + environment
pup logs search --query="service:user-service AND env:production" --from="15m"
Advanced attribute filters
pup logs search --query="service:payment AND @duration:>5s" --from="1h"
Control result count
pup logs search --query="service:payment AND status:error" --from="24h" --limit=200
Sort oldest first
pup logs search --query="status:error" --from="1h" --sort="asc"
Search Flags
Flag Description
--query
Datadog query string (required)
--from
Start time: relative (1h , 30m , 7d ) or Unix ms (required)
--to
End time (default: now )
--limit
Max results (default: 50, max: 1000)
--sort
asc or desc (default: desc)
--index
Comma-separated log indexes
--output / -o
json (default), table , yaml
Aggregate Logs
Compute metrics from logs -- counts, averages, percentiles. Useful for triage.
How many errors per service in the last 24h?
pup logs aggregate --query="status:error" --from="24h" --compute="count" --group-by="service"
Average request duration by service
pup logs aggregate --query="*" --from="1h" --compute="avg(@duration)" --group-by="service"
99th percentile latency
pup logs aggregate --query="service:api" --from="2h" --compute="percentile(@duration, 99)"
Error count by HTTP status code
pup logs aggregate --query="status:error" --from="1d" --compute="count" --group-by="@http.status_code"
Compute Options
Compute Example Description
count
--compute="count"
Count matching logs
avg(metric)
--compute="avg(@duration)"
Average of a numeric attribute
sum(metric)
--compute="sum(@bytes)"
Sum
min(metric)
--compute="min(@latency)"
Minimum
max(metric)
--compute="max(@latency)"
Maximum
cardinality(field)
--compute="cardinality(@user.id)"
Unique values
percentile(metric, N)
--compute="percentile(@duration, 99)"
Percentile
Query Metrics
Query time-series metrics data.
CPU usage across all hosts in the last hour
pup metrics query --query="avg:system.cpu.user{*}" --from="1h"
Memory for a specific service in production
pup metrics query --query="avg:system.mem.used{service:web,env:prod}" --from="4h"
Search for available metrics
pup metrics list --filter="system.cpu.*"
Get metadata for a specific metric
pup metrics get system.cpu.user
Metrics Flags
Flag Description
--query
Datadog metrics query (required)
--from
Start time: relative (1h , 30m , 7d ) or Unix ms (required)
--to
End time (default: now)
--output / -o
json (default), table , yaml
APM Services
List services and their performance statistics. Note: APM commands use Unix timestamps (not relative time).
List all APM services
pup apm services list
Service performance stats (last hour)
pup apm services stats --start=$(date -v-1H +%s) --end=$(date +%s)
Filter by environment
pup apm services stats --start=$(date -v-1H +%s) --end=$(date +%s) --env=prod
List operations for a service
pup apm services operations web-server --start=$(date -v-1H +%s) --end=$(date +%s)
List resources (endpoints) for a service operation
pup apm services resources web-server --operation="GET /api/users" --from=$(date -v-1H +%s) --to=$(date +%s)
APM Flags
Flag Description
--start
Start time as Unix timestamp (required for stats/operations)
--end
End time as Unix timestamp (required for stats/operations)
--env
Filter by environment
--primary-tag
Filter by primary tag (group:value )
--output / -o
json (default), table , yaml
APM Dependencies
View service call relationships based on trace data.
All service dependencies in production
pup apm dependencies list --env=prod --start=$(date -v-1H +%s) --end=$(date +%s)
Dependencies for a specific service
pup apm dependencies list web-server --env=prod --start=$(date -v-1H +%s) --end=$(date +%s)
Service flow map with performance metrics
pup apm flow-map --query="env:prod" --from=$(date -v-1H +%s) --to=$(date +%s)
Datadog Query Syntax
All query filters use Datadog's standard search syntax:
service:my-service Filter by service status:error Filter by log status host:my-host Filter by host env:production Filter by environment @duration:>5s Numeric attribute filter "exact phrase" Exact match service:web AND status:error Boolean operators (AND, OR, NOT) service:web-* Wildcards -status:info Negation
Output Formats
All commands support --output / -o :
Format Flag Use when
JSON --output json (default) Piping to jq , programmatic analysis
Table --output table
Human-readable overview
YAML --output yaml
Configuration-style output
Pipe JSON to jq for field selection
pup logs search --query="status:error" --from="1h" | jq '.data[].attributes.message'
Human-readable table
pup logs search --query="status:error" --from="1h" --output table
Common Investigation Patterns
1. Start broad: what services have errors?
pup logs aggregate --query="status:error" --from="1h" --compute="count" --group-by="service"
2. Drill into the top offender
pup logs search --query="service:payment AND status:error" --from="1h" --output table
3. Get full JSON details for a specific timeframe
pup logs search --query="service:payment AND status:error" --from="30m" --limit=10
4. Check if it's environment-specific
pup logs aggregate --query="service:payment AND status:error" --from="1h" --compute="count" --group-by="env"
5. Check APM service health
pup apm services stats --start=$(date -v-1H +%s) --end=$(date +%s) --env=prod
6. View service dependencies
pup apm dependencies list payment --env=prod --start=$(date -v-1H +%s) --end=$(date +%s)
7. Check a specific metric
pup metrics query --query="avg:trace.servlet.request.duration{service:payment}" --from="1h"
Time Ranges
Logs & Metrics accept relative durations:
Input Meaning
1h
1 hour ago
30m
30 minutes ago
7d
7 days ago
1w
1 week ago
now
Current time (default for --to)
APM commands require Unix timestamps. Use date to compute them:
Shell 1 hour ago Now
macOS $(date -v-1H +%s)
$(date +%s)
Linux $(date -d '1 hour ago' +%s)
$(date +%s)