Vertex Engine Inspector
Overview
Inspect and validate Vertex AI Agent Engine deployments across seven categories: runtime configuration, Code Execution Sandbox, Memory Bank, A2A protocol compliance, security posture, performance metrics, and monitoring observability. This skill generates weighted production-readiness scores (0-100%) with actionable recommendations for each deployment.
Prerequisites
-
gcloud CLI authenticated with roles/aiplatform.viewer and roles/monitoring.viewer IAM roles
-
Access to the target Google Cloud project hosting the Agent Engine deployment
-
curl or gcloud for A2A protocol endpoint testing (AgentCard, Task API, Status API)
-
Cloud Monitoring API enabled for performance metrics retrieval
-
Familiarity with Vertex AI Agent Engine concepts: Code Execution Sandbox, Memory Bank, Model Armor
Instructions
-
Connect to the Agent Engine deployment by retrieving agent metadata via gcloud ai agents describe
-
Parse the runtime configuration: model selection (Gemini 2.5 Pro/Flash), tools enabled, VPC settings, and scaling policies
-
Validate Code Execution Sandbox settings: confirm state TTL is 7-14 days, sandbox type is SECURE_ISOLATED , and IAM permissions are scoped to required GCP services only
-
Check Memory Bank configuration: verify enabled status, retention policy (min 100 memories), Firestore encryption, indexing enabled, and auto-cleanup active
-
Test A2A protocol compliance by probing /.well-known/agent-card , POST /v1/tasks:send , and GET /v1/tasks/<task-id> endpoints for correct responses
-
Audit security posture: validate IAM least-privilege roles, VPC Service Controls perimeter, Model Armor activation, encryption at rest and in transit, and absence of hardcoded credentials
-
Query Cloud Monitoring for performance metrics: request count, error rate (target < 5%), latency percentiles (p50/p95/p99), token usage, and cost estimates over the last 24 hours
-
Assess monitoring and observability: confirm Cloud Monitoring dashboards, alerting policies, structured logging, OpenTelemetry tracing, and Cloud Error Reporting are configured
-
Calculate weighted scores across all categories and determine overall production readiness status
-
Generate a prioritized list of recommendations with estimated score improvement per remediation
See ${CLAUDE_SKILL_DIR}/references/inspection-workflow.md for the phased inspection process and ${CLAUDE_SKILL_DIR}/references/inspection-categories.md for detailed check criteria.
Output
-
Inspection report in YAML format with per-category scores and overall readiness percentage
-
Runtime configuration summary: model, tools, VPC, scaling settings
-
A2A protocol compliance matrix: pass/fail for AgentCard, Task API, Status API
-
Security posture score with breakdown: IAM, VPC-SC, Model Armor, encryption, secrets
-
Performance metrics dashboard: error rate, latency percentiles, token usage, daily cost estimate
-
Prioritized recommendations with estimated score improvement per item
See ${CLAUDE_SKILL_DIR}/references/example-inspection-report.md for a complete sample report.
Error Handling
Error Cause Solution
Agent metadata not accessible Insufficient IAM permissions or incorrect agent ID Verify roles/aiplatform.viewer granted; confirm agent ID with gcloud ai agents list
A2A AgentCard endpoint 404 Agent not configured for A2A protocol or endpoint path incorrect Check agent configuration for A2A enablement; verify /.well-known/agent-card path
Cloud Monitoring metrics empty Monitoring API not enabled or no recent traffic Run gcloud services enable monitoring.googleapis.com ; generate test traffic first
VPC-SC perimeter blocking access Inspector running outside VPC Service Controls perimeter Add inspector service account to access level; use VPC-SC bridge or access policy
Code Execution TTL out of range State TTL set below 1 day or above 14 days Adjust TTL to 7-14 days for production; values above 14 days are rejected by Agent Engine
See ${CLAUDE_SKILL_DIR}/references/errors.md for additional error scenarios.
Examples
Scenario 1: Pre-Production Readiness Check -- Inspect a newly deployed ADK agent before production launch. Run all 28 checklist items across security, performance, monitoring, compliance, and reliability. Target: overall score above 85% before approving production traffic.
Scenario 2: Security Audit After IAM Change -- Re-inspect security posture after modifying service account roles. Validate that least-privilege is maintained (target: IAM score 95%+), VPC-SC perimeter is intact, and Model Armor remains active.
Scenario 3: Performance Degradation Investigation -- Inspect an agent showing elevated error rates. Query 24-hour performance metrics, identify latency spikes at p95/p99, check auto-scaling behavior, and correlate with token usage patterns to isolate the root cause.
Resources
-
Vertex AI Agent Engine Documentation -- deployment and configuration
-
A2A Protocol Specification -- AgentCard, Task API, protocol compliance
-
Cloud Monitoring API -- metrics queries and dashboard configuration
-
VPC Service Controls -- perimeter setup and access policies
-
Model Armor -- prompt injection protection configuration