datadog-observability

Datadog Observability Skill (via pup)

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "datadog-observability" with this command: npx skills add ethanrcohen/datadog-agent-skill/ethanrcohen-datadog-agent-skill-datadog-observability

Datadog Observability Skill (via pup)

Requires: pup CLI, authenticated via pup auth login or DD_API_KEY

  • DD_APP_KEY env vars.

Choose Your Workflow

Goal Command

Find errors in a service Search Logs

Count errors / compute metrics Aggregate Logs

Query time-series metrics Query Metrics

List APM services + perf stats APM Services

View service dependencies APM Dependencies

Search Logs

Returns log entries matching a Datadog query.

Errors in a service in the last hour

pup logs search --query="service:payment AND status:error" --from="1h"

Filter by service + environment

pup logs search --query="service:user-service AND env:production" --from="15m"

Advanced attribute filters

pup logs search --query="service:payment AND @duration:>5s" --from="1h"

Control result count

pup logs search --query="service:payment AND status:error" --from="24h" --limit=200

Sort oldest first

pup logs search --query="status:error" --from="1h" --sort="asc"

Search Flags

Flag Description

--query

Datadog query string (required)

--from

Start time: relative (1h , 30m , 7d ) or Unix ms (required)

--to

End time (default: now )

--limit

Max results (default: 50, max: 1000)

--sort

asc or desc (default: desc)

--index

Comma-separated log indexes

--output / -o

json (default), table , yaml

Aggregate Logs

Compute metrics from logs -- counts, averages, percentiles. Useful for triage.

How many errors per service in the last 24h?

pup logs aggregate --query="status:error" --from="24h" --compute="count" --group-by="service"

Average request duration by service

pup logs aggregate --query="*" --from="1h" --compute="avg(@duration)" --group-by="service"

99th percentile latency

pup logs aggregate --query="service:api" --from="2h" --compute="percentile(@duration, 99)"

Error count by HTTP status code

pup logs aggregate --query="status:error" --from="1d" --compute="count" --group-by="@http.status_code"

Compute Options

Compute Example Description

count

--compute="count"

Count matching logs

avg(metric)

--compute="avg(@duration)"

Average of a numeric attribute

sum(metric)

--compute="sum(@bytes)"

Sum

min(metric)

--compute="min(@latency)"

Minimum

max(metric)

--compute="max(@latency)"

Maximum

cardinality(field)

--compute="cardinality(@user.id)"

Unique values

percentile(metric, N)

--compute="percentile(@duration, 99)"

Percentile

Query Metrics

Query time-series metrics data.

CPU usage across all hosts in the last hour

pup metrics query --query="avg:system.cpu.user{*}" --from="1h"

Memory for a specific service in production

pup metrics query --query="avg:system.mem.used{service:web,env:prod}" --from="4h"

Search for available metrics

pup metrics list --filter="system.cpu.*"

Get metadata for a specific metric

pup metrics get system.cpu.user

Metrics Flags

Flag Description

--query

Datadog metrics query (required)

--from

Start time: relative (1h , 30m , 7d ) or Unix ms (required)

--to

End time (default: now)

--output / -o

json (default), table , yaml

APM Services

List services and their performance statistics. Note: APM commands use Unix timestamps (not relative time).

List all APM services

pup apm services list

Service performance stats (last hour)

pup apm services stats --start=$(date -v-1H +%s) --end=$(date +%s)

Filter by environment

pup apm services stats --start=$(date -v-1H +%s) --end=$(date +%s) --env=prod

List operations for a service

pup apm services operations web-server --start=$(date -v-1H +%s) --end=$(date +%s)

List resources (endpoints) for a service operation

pup apm services resources web-server --operation="GET /api/users" --from=$(date -v-1H +%s) --to=$(date +%s)

APM Flags

Flag Description

--start

Start time as Unix timestamp (required for stats/operations)

--end

End time as Unix timestamp (required for stats/operations)

--env

Filter by environment

--primary-tag

Filter by primary tag (group:value )

--output / -o

json (default), table , yaml

APM Dependencies

View service call relationships based on trace data.

All service dependencies in production

pup apm dependencies list --env=prod --start=$(date -v-1H +%s) --end=$(date +%s)

Dependencies for a specific service

pup apm dependencies list web-server --env=prod --start=$(date -v-1H +%s) --end=$(date +%s)

Service flow map with performance metrics

pup apm flow-map --query="env:prod" --from=$(date -v-1H +%s) --to=$(date +%s)

Datadog Query Syntax

All query filters use Datadog's standard search syntax:

service:my-service Filter by service status:error Filter by log status host:my-host Filter by host env:production Filter by environment @duration:>5s Numeric attribute filter "exact phrase" Exact match service:web AND status:error Boolean operators (AND, OR, NOT) service:web-* Wildcards -status:info Negation

Output Formats

All commands support --output / -o :

Format Flag Use when

JSON --output json (default) Piping to jq , programmatic analysis

Table --output table

Human-readable overview

YAML --output yaml

Configuration-style output

Pipe JSON to jq for field selection

pup logs search --query="status:error" --from="1h" | jq '.data[].attributes.message'

Human-readable table

pup logs search --query="status:error" --from="1h" --output table

Common Investigation Patterns

1. Start broad: what services have errors?

pup logs aggregate --query="status:error" --from="1h" --compute="count" --group-by="service"

2. Drill into the top offender

pup logs search --query="service:payment AND status:error" --from="1h" --output table

3. Get full JSON details for a specific timeframe

pup logs search --query="service:payment AND status:error" --from="30m" --limit=10

4. Check if it's environment-specific

pup logs aggregate --query="service:payment AND status:error" --from="1h" --compute="count" --group-by="env"

5. Check APM service health

pup apm services stats --start=$(date -v-1H +%s) --end=$(date +%s) --env=prod

6. View service dependencies

pup apm dependencies list payment --env=prod --start=$(date -v-1H +%s) --end=$(date +%s)

7. Check a specific metric

pup metrics query --query="avg:trace.servlet.request.duration{service:payment}" --from="1h"

Time Ranges

Logs & Metrics accept relative durations:

Input Meaning

1h

1 hour ago

30m

30 minutes ago

7d

7 days ago

1w

1 week ago

now

Current time (default for --to)

APM commands require Unix timestamps. Use date to compute them:

Shell 1 hour ago Now

macOS $(date -v-1H +%s)

$(date +%s)

Linux $(date -d '1 hour ago' +%s)

$(date +%s)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

datadog-observability

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

Memory

Infinite organized memory that complements your agent's built-in memory with unlimited categorized storage.

Archived SourceRecently Updated
Automation

find-skills

Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.

Archived SourceRecently Updated
Automation

My Browser Agent

# my-browser-agent

Archived SourceRecently Updated