Kubernetes Management (Kubernetes 管理)
为运维人员提供 Kubernetes 集群管理和监控功能,包括查看 Pod 状态、日志、资源使用等。
Provides Kubernetes cluster management and monitoring for DevOps engineers, including pod status, logs, resource usage, and more.
Quick Start
View Pods (查看 Pod)
显示 production namespace 中的所有 pods List all pods in the production namespace
Check Pod Logs (查看 Pod 日志)
查看 pod api-server-abc123 的日志 Get logs from pod api-server-abc123
Check Resource Usage (查看资源使用)
显示集群资源使用情况 Show cluster resource usage
Key Features
- Pod Management (Pod 管理)
View and monitor pods across namespaces:
Common Operations:
List pods
kubectl get pods -n <namespace>
Get pod details
kubectl describe pod <pod-name> -n <namespace>
View pod logs
kubectl logs <pod-name> -n <namespace>
Follow logs
kubectl logs -f <pod-name> -n <namespace>
- Service and Deployment (服务与部署)
Monitor services and deployments:
Common Operations:
List services
kubectl get services -n <namespace>
List deployments
kubectl get deployments -n <namespace>
Check deployment status
kubectl rollout status deployment/<name> -n <namespace>
- Resource Monitoring (资源监控)
Check cluster and pod resource usage:
Common Operations:
Node resource usage
kubectl top nodes
Pod resource usage
kubectl top pods -n <namespace>
Check cluster info
kubectl cluster-info
Common Workflows
Workflow 1: Troubleshoot Pod Issues
- User: "production namespace 中有哪些 pod 出问题了?"
- List pods: kubectl get pods -n production
- Identify pods with issues (Not Running, CrashLoopBackOff)
- Get pod details: kubectl describe pod <problem-pod>
- Check logs: kubectl logs <problem-pod>
- Analyze error messages
- Suggest solutions
Workflow 2: Check Deployment Health
- User: "检查 api-server deployment 的健康状态"
- Get deployment: kubectl get deployment api-server -n production
- Check replicas: desired vs available
- Check rollout status
- View recent events
- Report status and any issues
Workflow 3: Monitor Resource Usage
- User: "集群资源使用情况如何?"
- Get node metrics: kubectl top nodes
- Get pod metrics: kubectl top pods --all-namespaces
- Calculate utilization percentages
- Identify resource-intensive pods
- Provide recommendations
Examples (示例)
Example 1: List Pods (列出 Pods)
User (Chinese):
显示 production namespace 中的所有 pods
Assistant Response:
我会列出 production namespace 中的所有 pods。
<executes: kubectl get pods -n production>
Production Namespace - Pods:
| Name | Status | Restarts | Age |
|---|---|---|---|
| api-server-7d4f8b9c-abc123 | Running | 0 | 5d |
| api-server-7d4f8b9c-def456 | Running | 1 | 5d |
| web-app-5c8a7b-ghi789 | Running | 0 | 3d |
| worker-queue-9f3e2d-jkl012 | CrashLoopBackOff | 10 | 2h |
| database-6a4b5c-mno345 | Running | 0 | 15d |
Summary:
- Total: 5 pods
- Running: 4 pods
- Issues: 1 pod (worker-queue)
⚠️ Alert: worker-queue-9f3e2d-jkl012 is in CrashLoopBackOff状态。需要检查日志吗?
Example 2: View Pod Logs (查看 Pod 日志)
User (English):
Get logs from pod worker-queue-9f3e2d-jkl012 in production namespace
Assistant Response:
I'll retrieve the logs from the pod.
<executes: kubectl logs worker-queue-9f3e2d-jkl012 -n production --tail=50>
Pod Logs: worker-queue-9f3e2d-jkl012
[2026-01-16 10:15:23] INFO: Worker starting... [2026-01-16 10:15:24] INFO: Connecting to Redis... [2026-01-16 10:15:25] ERROR: Connection failed: redis://redis-service:6379 [2026-01-16 10:15:25] ERROR: Error: getaddrinfo ENOTFOUND redis-service [2026-01-16 10:15:26] INFO: Retrying connection (1/10)... [2026-01-16 10:15:27] ERROR: Connection failed: redis://redis-service:6379 [2026-01-16 10:15:28] FATAL: Max retries exceeded, exiting
Analysis:
- Issue: Cannot connect to Redis service
- Root Cause: DNS resolution failure for
redis-service - Possible Reasons:
- Redis service not running in the namespace
- Service name mismatch
- Network policy blocking connection
Recommendations:
- Check if Redis service exists:
kubectl get svc redis-service -n production - Verify service endpoints:
kubectl get endpoints redis-service -n production - Check network policies
- Review deployment configuration
Example 3: Resource Usage (资源使用)
User (Chinese):
显示集群各节点的资源使用情况
Assistant Response:
我会显示集群节点的资源使用情况。
<executes: kubectl top nodes>
Cluster Resource Usage:
| Node | CPU (cores) | CPU % | Memory (GB) | Memory % |
|---|---|---|---|---|
| node-1 | 3.2 / 8 | 40% | 12.5 / 32 | 39% |
| node-2 | 5.6 / 8 | 70% | 18.2 / 32 | 57% |
| node-3 | 2.1 / 8 | 26% | 8.7 / 32 | 27% |
| node-4 | 6.8 / 8 | 85% | 24.3 / 32 | 76% |
Analysis:
- node-4 is running at high utilization (85% CPU, 76% Memory)
- node-3 has the most available capacity
- Average cluster utilization: 55% CPU, 50% Memory
Recommendations:
- Consider redistributing pods from node-4
- Monitor node-4 for potential issues
- Check if any pods need resource limit adjustments
- Review horizontal pod autoscaling settings
Important Notes
Access Control (访问控制)
-
Requires kubectl configured with cluster access
-
User permissions defined by RBAC roles
-
Some operations may be restricted based on role
-
Always verify namespace access before operations
Safety Guidelines (安全指南)
-
Read operations only through this skill
-
No pod deletion or modification
-
No deployment changes
-
No configuration updates
-
For changes, use standard deployment processes
Best Practices (最佳实践)
-
Always specify namespace
-
Use labels for filtering
-
Check resource limits before scaling
-
Monitor metrics before making decisions
-
Document all investigations
Prerequisites
kubectl Configuration
Ensure kubectl is configured to access your clusters:
Verify kubectl is installed
kubectl version --client
List available contexts
kubectl config get-contexts
Switch context if needed
kubectl config use-context <context-name>
Verify access
kubectl get nodes
GKE Authentication (GCP)
For GKE clusters:
Authenticate with GCP
gcloud auth login
Get cluster credentials
gcloud container clusters get-credentials <cluster-name> --region=<region>
Verify access
kubectl cluster-info
Required Permissions
Minimum RBAC permissions needed:
-
pods/list
-
List pods
-
pods/get
-
Get pod details
-
pods/log
-
Read pod logs
-
services/list
-
List services
-
deployments/list
-
List deployments
-
nodes/list
-
List nodes
Limitations
Current Limitations
-
Read-only operations via kubectl commands
-
No direct MCP server integration (uses kubectl CLI)
-
No real-time monitoring dashboards
-
No automated remediation
-
Limited to configured clusters
Future Enhancements
-
Direct Kubernetes API integration
-
Real-time pod monitoring
-
Automated issue detection
-
Integration with monitoring systems (Prometheus, Grafana)
-
Pod restart and scaling capabilities
-
Log aggregation and search
Troubleshooting
Issue 1: "kubectl: command not found"
Solutions:
-
Install kubectl: https://kubernetes.io/docs/tasks/tools/
-
Add to PATH
-
Restart Claude Code
Issue 2: "Unable to connect to cluster"
Solutions:
-
Verify kubectl context: kubectl config current-context
-
Check cluster credentials
-
Verify network access
-
Re-authenticate if needed
Issue 3: "Forbidden: User cannot list pods"
Solutions:
-
Check RBAC permissions
-
Contact cluster admin
-
Verify service account
-
Switch to correct namespace
Related Skills
-
cloud-resources : Manage cloud infrastructure
-
Future: monitoring-alerts , incident-response
Safety & Compliance
This skill provides read-only access to Kubernetes clusters:
-
✅ View pod status and logs
-
✅ Monitor resource usage
-
✅ Check service health
-
❌ Cannot delete pods
-
❌ Cannot modify deployments
-
❌ Cannot change configurations
For any modifications, use established CI/CD pipelines and change management processes.