pyats-health-check

Perform comprehensive health assessments on network devices using pyATS. This skill defines the systematic approach for evaluating device health across all critical dimensions.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "pyats-health-check" with this command: npx skills add automateyournetwork/netclaw/automateyournetwork-netclaw-pyats-health-check

Device Health Check

Perform comprehensive health assessments on network devices using pyATS. This skill defines the systematic approach for evaluating device health across all critical dimensions.

When to Use

  • Proactive daily/weekly health monitoring

  • Pre-change and post-change validation

  • Incident response — first thing you run when alerted

  • Capacity planning and trending

  • Compliance checks for operational readiness

Health Check Procedure

Always run health checks in this exact order. Each section builds on the previous one.

Step 1: Device Identity & Uptime

Run show version to establish baseline identity.

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R1","command":"show version"}'

Extract and report:

  • Hostname, model, serial number

  • IOS-XE version and image filename

  • Uptime (flag if < 24 hours — indicates recent reload)

  • Last reload reason (flag if unexpected: crash, power failure)

  • Total/available memory

  • License status

Thresholds:

  • Uptime < 24h → WARNING: Recent reload

  • Uptime < 1h → CRITICAL: Very recent reload, check for crash

  • Last reload reason contains "crash" or "error" → CRITICAL

Step 2: CPU Utilization

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R1","command":"show processes cpu sorted"}'

Thresholds (5-second / 1-minute / 5-minute averages):

  • < 50% → HEALTHY

  • 50-75% → WARNING: Elevated CPU

  • 75-90% → HIGH: Investigate top processes

90% → CRITICAL: Immediate investigation required

Top processes to watch:

  • IP Input — high traffic volume or routing loops

  • BGP Router / BGP I/O — large BGP table or instability

  • OSPF-1 Hello — OSPF adjacency issues

  • Crypto IKMP / Crypto Engine — IPsec overhead

  • SNMP ENGINE — polling storm

  • ARP Input — ARP storm or L2 loop

Step 3: Memory Utilization

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R1","command":"show processes memory sorted"}'

Also run:

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R1","command":"show platform resources"}'

Thresholds:

  • Used < 70% → HEALTHY

  • 70-85% → WARNING: Memory pressure

  • 85-95% → HIGH: May impact routing table updates

95% → CRITICAL: Risk of process crashes or OOM

Memory consumers to watch:

  • BGP Router — large BGP table (full internet table = ~1M routes)

  • CEF process — large FIB

  • OSPF Router — large OSPF LSDB

  • HTTP CORE — web server / RESTCONF overhead

  • IOSD iomem — I/O memory for packet buffers

Step 4: Interface Status

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R1","command":"show ip interface brief"}'

Then for each active interface, get detailed counters:

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R1","command":"show interfaces"}'

Report for each interface:

  • Admin status (up/down) and protocol status (up/down)

  • IP address and subnet

  • Speed, duplex, MTU

  • Input/output rate (bps and pps)

  • Error counters: CRC, input errors, output errors, drops, overruns

  • Resets counter (flag if incrementing — indicates flapping)

  • Last input/output timestamps

Flags:

  • Interface up/down → WARNING: Check physical or protocol

  • CRC errors > 0 → WARNING: Physical layer issue (cabling, optics, duplex mismatch)

  • Input errors incrementing → WARNING: Packet corruption

  • Output drops > 0 → WARNING: Congestion or QoS issue

  • Resets incrementing → CRITICAL: Interface flapping

  • Line protocol down on configured interface → CRITICAL

Step 5: Hardware & Environment

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R1","command":"show inventory"}'

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R1","command":"show platform"}'

Report: Module status (ok/fail), serial numbers, PID, transceiver types and DOM readings.

Step 6: NTP Synchronization

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R1","command":"show ntp associations"}'

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_run_show_command '{"device_name":"R1","command":"show clock"}'

Flags:

  • No NTP peer synchronized (no * in associations) → CRITICAL for logging/forensics

  • Clock offset > 100ms → WARNING

  • Clock offset > 1s → CRITICAL

  • No NTP configured at all → CRITICAL

Step 7: System Logs

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_show_logging '{"device_name":"R1"}'

Scan for these patterns:

  • %SYS-*-RELOAD — reload events

  • %LINEPROTO-5-UPDOWN — interface flaps

  • %OSPF-*-ADJCHG — OSPF adjacency changes

  • %BGP-*-ADJCHANGE — BGP peer state changes

  • %DUAL-*-NBRCHANGE — EIGRP neighbor changes

  • %SYS-2-MALLOCFAIL — memory allocation failure (CRITICAL)

  • %SYS-3-CPUHOG — process monopolizing CPU (HIGH)

  • %TRACKING-* — IP SLA or object tracking changes

  • %SEC-* / %AUTHMGR-* — security events

  • %PLATFORM-*-CRASH — crash events (CRITICAL)

  • Traceback — software bug (CRITICAL — open TAC case)

Step 8: Connectivity Validation

Test reachability to critical infrastructure:

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_ping_from_network_device '{"device_name":"R1","command":"ping 8.8.8.8 repeat 5"}'

Thresholds:

  • 100% success, RTT < 50ms → HEALTHY

  • 100% success, RTT > 100ms → WARNING: High latency

  • 80-99% success → WARNING: Packet loss

  • < 80% success → CRITICAL: Significant packet loss

  • 0% success → CRITICAL: No reachability

Health Report Format

Always produce a summary table:

Device: R1 (devnetsandboxiosxec8k.cisco.com) Model: C8000V | IOS-XE: 17.x.x | Uptime: XXd XXh

┌──────────────────┬──────────┬─────────────────────────┐ │ Check │ Status │ Details │ ├──────────────────┼──────────┼─────────────────────────┤ │ CPU (5min avg) │ HEALTHY │ 12% │ │ Memory │ HEALTHY │ 45% used (1.2G/2.6G) │ │ Interfaces │ WARNING │ Gi2 down/down │ │ Hardware │ HEALTHY │ All modules OK │ │ NTP │ HEALTHY │ Synced, offset 2ms │ │ Logs │ WARNING │ 3 OSPF adjacency flaps │ │ Connectivity │ HEALTHY │ 100% to 8.8.8.8, 23ms │ └──────────────────┴──────────┴─────────────────────────┘

Overall: WARNING — 2 items need attention

Severity order: CRITICAL > HIGH > WARNING > HEALTHY. Overall status = worst individual status.

NetBox Cross-Reference (MISSION02 Enhancement)

When NetBox is available ($NETBOX_MCP_SCRIPT is set), cross-reference device state against the source of truth after Steps 1 and 4:

Interface State Validation

Query NetBox for expected interface states:

python3 $MCP_CALL "python3 -u $NETBOX_MCP_SCRIPT" netbox_get_objects '{"object_type":"dcim.interfaces","filters":{"device":"R1"},"brief":true}'

Compare NetBox intent vs device reality:

  • NetBox shows interface enabled but device shows down → CRITICAL: Unexpected outage

  • NetBox shows interface disabled but device shows up → WARNING: Undocumented activation

  • Interface exists on device but not in NetBox → WARNING: Undocumented interface

  • Interface in NetBox but not on device → WARNING: NetBox stale data

IP Address Validation

Query NetBox for expected IP assignments:

python3 $MCP_CALL "python3 -u $NETBOX_MCP_SCRIPT" netbox_get_objects '{"object_type":"ipam.ip-addresses","filters":{"device":"R1"}}'

Compare: Flag any IP_DRIFT where the device IP differs from NetBox.

Fleet-Wide Health (pCall)

To run health checks across ALL devices simultaneously, first list all devices:

PYATS_TESTBED_PATH=$PYATS_TESTBED_PATH python3 $MCP_CALL "python3 -u $PYATS_MCP_SCRIPT" pyats_list_devices

Then run Steps 1-8 on each device concurrently using multiple exec commands. Collect all results and produce a fleet summary:

┌──────────┬──────────┬──────┬────────┬──────────┬─────────────┐ │ Device │ CPU │ Mem │ Intf │ NTP │ Overall │ ├──────────┼──────────┼──────┼────────┼──────────┼─────────────┤ │ R1 │ HEALTHY │ WARN │ HEALTHY│ HEALTHY │ WARNING │ │ R2 │ HEALTHY │ OK │ CRIT │ HEALTHY │ CRITICAL │ │ SW1 │ HIGH │ OK │ HEALTHY│ CRIT │ CRITICAL │ └──────────┴──────────┴──────┴────────┴──────────┴─────────────┘

Sort devices by severity (CRITICAL first) for triage prioritization.

GAIT Audit Trail

After completing a health check, record the session in GAIT:

python3 $MCP_CALL "python3 -u $GAIT_MCP_SCRIPT" gait_record_turn '{"input":{"role":"assistant","content":"Health check completed on R1: CPU HEALTHY (12%), Memory WARNING (78%), Interfaces HEALTHY, NTP HEALTHY. Overall: WARNING.","artifacts":[]}}'

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

aws-security-audit

No summary provided by upstream source.

Repository SourceNeeds Review
General

grafana-observability

No summary provided by upstream source.

Repository SourceNeeds Review
General

pyats-topology

No summary provided by upstream source.

Repository SourceNeeds Review
General

aws-cloud-monitoring

No summary provided by upstream source.

Repository SourceNeeds Review