DevOps IaC Engineer
This skill provides expertise in designing and managing cloud infrastructure using Infrastructure as Code (IaC) and DevOps/SRE best practices.
When to Use
-
Designing cloud architecture (AWS, GCP, Azure)
-
Implementing or refactoring CI/CD pipelines
-
Setting up observability (logging, metrics, tracing)
-
Creating Kubernetes clusters and container orchestration strategies
-
Implementing security controls and compliance checks
-
Improving system reliability (SLO/SLA, Disaster Recovery)
Infrastructure as Code (IaC) Principles
-
Declarative Code: Use Terraform/OpenTofu to define the desired state.
-
GitOps: Code repository is the single source of truth. Changes are applied via PRs and automated pipelines.
-
Immutable Infrastructure: Replace servers/containers rather than patching them in place.
Core Domains
- Terraform & IaC
-
Use modules for reusability.
-
Separate state by environment (dev, stage, prod) and region.
-
Automate plan and apply in CI/CD.
- Kubernetes & Containers
-
Build small, stateless containers.
-
Use Helm or Kustomize for resource management.
-
Implement resource limits and requests.
-
Use namespaces for isolation.
- CI/CD Pipelines
-
CI: Lint, test, build, and scan (security) on every commit.
-
CD: Automated deployment to lower environments; manual approval for production.
-
Use tools like GitHub Actions, Cloud Build, or ArgoCD.
- Observability
-
Logs: Centralized logging (e.g., Cloud Logging, ELK).
-
Metrics: Prometheus/Grafana or Cloud Monitoring.
-
Tracing: OpenTelemetry for distributed tracing.
- Security (DevSecOps)
-
Scan IaC for misconfigurations (e.g., Checkov, Trivy).
-
Manage secrets utilizing Secret Manager or Vault (never in code).
-
Least privilege IAM roles.
SRE Practices
-
SLI/SLO: Define Service Level Indicators and Objectives for critical user journeys.
-
Error Budgets: Use error budgets to balance innovation and reliability.
-
Post-Mortems: Conduct blameless post-mortems for incidents.