ACCESSING CLUSTERS
CRITICAL: Always prefix kubectl/flux commands with inline KUBECONFIG assignment. Do NOT use export or &&
- the variable must be set in the same command:
CORRECT - inline assignment
KUBECONFIG=~/.kube/<cluster>.yaml kubectl get pods
WRONG - export with && breaks in some shell contexts
export KUBECONFIG=~/.kube/<cluster>.yaml && kubectl get pods
Cluster Context
CRITICAL: Always confirm cluster before running commands.
Cluster Purpose Kubeconfig
dev
Manual testing ~/.kube/dev.yaml
integration
Automated testing ~/.kube/integration.yaml
live
Production ~/.kube/live.yaml
KUBECONFIG=~/.kube/<cluster>.yaml kubectl <command>
Accessing Internal Services
Platform services are exposed through the internal ingress gateway over HTTPS. DNS URLs are useful for browser-based access (Grafana, Hubble UI, Longhorn UI).
OAuth2 Proxy caveat: Prometheus, Alertmanager, and some other services are behind OAuth2 Proxy. DNS URLs redirect to an OAuth login page and cannot be used for API queries via curl. Use kubectl exec or port-forward instead for programmatic access.
Service Live Auth API Access
Prometheus https://prometheus.internal.tomnowak.work
OAuth2 Proxy kubectl exec or port-forward
Alertmanager https://alertmanager.internal.tomnowak.work
OAuth2 Proxy kubectl exec or port-forward
Grafana https://grafana.internal.tomnowak.work
Built-in auth Browser only
Hubble UI https://hubble.internal.tomnowak.work
None Browser
Longhorn UI https://longhorn.internal.tomnowak.work
None Browser
Garage Admin https://garage.internal.tomnowak.work
None Browser
Domain pattern: <service>.internal.<cluster-suffix>.tomnowak.work
-
live: internal.tomnowak.work
-
integration: internal.integration.tomnowak.work
-
dev: internal.dev.tomnowak.work
Querying Prometheus/Alertmanager API
Option 1: kubectl exec (quick, no setup)
KUBECONFIG=~/.kube/<cluster>.yaml kubectl exec -n monitoring prometheus-kube-prometheus-stack-0 -c prometheus --
wget -qO- 'http://localhost:9090/api/v1/query?query=up' | jq '.data.result'
KUBECONFIG=~/.kube/<cluster>.yaml kubectl exec -n monitoring prometheus-kube-prometheus-stack-0 -c prometheus --
wget -qO- 'http://localhost:9090/api/v1/alerts' | jq '.data.alerts[] | select(.state == "firing")'
KUBECONFIG=~/.kube/<cluster>.yaml kubectl exec -n monitoring alertmanager-kube-prometheus-stack-0 -c alertmanager --
wget -qO- 'http://localhost:9093/api/v2/alerts' | jq .
Option 2: Port-forward (for scripts and repeated queries)
KUBECONFIG=~/.kube/<cluster>.yaml kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090 & curl -s "http://localhost:9090/api/v1/query?query=up" | jq '.data.result'
Using the helper scripts:
Prometheus (start port-forward first; script defaults to http://localhost:9090)
KUBECONFIG=~/.kube/<cluster>.yaml kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090 & .claude/skills/prometheus/scripts/promql.sh alerts --firing
Loki (no HTTPRoute — always requires port-forward)
KUBECONFIG=~/.kube/<cluster>.yaml kubectl port-forward -n monitoring svc/loki-headless 3100:3100 & export LOKI_URL=http://localhost:3100 .claude/skills/loki/scripts/logql.sh tail '{namespace="monitoring"}' --since 15m
Common kubectl Patterns
Read-only commands used during daily operations and investigations:
Command Purpose
kubectl get pods -n <ns>
List pods in a namespace
kubectl get pods -A
List pods across all namespaces
kubectl describe pod <pod> -n <ns>
Detailed pod info with events
kubectl logs <pod> -n <ns> --tail=100
Recent logs from a pod
kubectl logs <pod> -n <ns> --previous
Logs from previous container instance
kubectl get events -n <ns> --sort-by='.lastTimestamp'
Recent events timeline
kubectl top pods -n <ns>
CPU/memory usage per pod
kubectl top nodes
CPU/memory usage per node
kubectl get ns <ns> --show-labels
Namespace labels (network policy profiles)
kubectl explain <resource>
API schema reference for a resource type
Flux GitOps Commands
Status and Reconciliation
Check status
KUBECONFIG=/.kube/<cluster>.yaml flux get all
KUBECONFIG=/.kube/<cluster>.yaml flux get kustomizations
KUBECONFIG=~/.kube/<cluster>.yaml flux get helmreleases -A
Trigger reconciliation
KUBECONFIG=/.kube/<cluster>.yaml flux reconcile source git flux-system
KUBECONFIG=/.kube/<cluster>.yaml flux reconcile kustomization <name>
KUBECONFIG=~/.kube/<cluster>.yaml flux reconcile helmrelease <name> -n <namespace>
Flux Status Interpretation
Status Meaning Action
Ready: True
Resource is reconciled and healthy None - operating normally
Ready: False
Resource failed to reconcile Check the message/reason for details
Stalled: True
Resource has stopped retrying after repeated failures Suspend/resume to reset (see sre skill)
Suspended: True
Resource is intentionally paused Resume when ready: flux resume <type> <name>
Reconciling
Resource is actively being applied Wait for completion
Researching Unfamiliar Services
When investigating unknown services, spawn a haiku agent to research documentation:
Task tool:
- subagent_type: "general-purpose"
- model: "haiku"
- prompt: "Research [service] troubleshooting docs. Focus on:
- Common failure modes
- Health indicators
- Configuration gotchas Start with: [docs-url]"
Chart URL to Docs mapping:
Chart Source Documentation
charts.jetstack.io
cert-manager.io/docs
charts.longhorn.io
longhorn.io/docs
grafana.github.io
grafana.com/docs
prometheus-community.github.io
prometheus.io/docs
Common Confusions
BAD: Use helm list to check Helm release status GOOD: Use kubectl get helmrelease -A
- Flux manages releases via CRDs, not Helm CLI
Keywords
kubernetes, kubectl, kubeconfig, flux, flux status, cluster access, internal URL, service URL, port-forward, helm release, gitops, reconciliation