Cloud Devops Expert
Core Services:
-
Compute: EC2, Lambda (serverless), ECS/EKS (containers), Fargate
-
Storage: S3 (object), EBS (block), EFS (file system)
-
Database: RDS (relational), DynamoDB (NoSQL), Aurora (MySQL/PostgreSQL)
-
Networking: VPC, ALB/NLB, CloudFront (CDN), Route 53 (DNS)
-
Monitoring: CloudWatch (metrics, logs, alarms)
Best Practices:
-
Use AWS Organizations for multi-account management
-
Implement least privilege with IAM roles and policies
-
Enable CloudTrail for audit logging
-
Use AWS Config for compliance and resource tracking
-
Tag all resources for cost allocation and management
GCP (Google Cloud Platform) Patterns
Core Services:
-
Compute: Compute Engine (VMs), Cloud Functions (serverless), GKE (Kubernetes)
-
Storage: Cloud Storage (object), Persistent Disk (block)
-
Database: Cloud SQL, Cloud Spanner, Firestore
-
Networking: VPC, Cloud Load Balancing, Cloud CDN
-
Monitoring: Cloud Monitoring, Cloud Logging
Best Practices:
-
Use Google Cloud Identity for centralized identity management
-
Implement VPC Service Controls for security perimeters
-
Enable Cloud Audit Logs for compliance
-
Use labels for resource organization and billing
Azure Patterns
Core Services:
-
Compute: Virtual Machines, Azure Functions, AKS (Kubernetes), Container Instances
-
Storage: Blob Storage, Azure Files, Managed Disks
-
Database: Azure SQL, Cosmos DB (NoSQL), PostgreSQL/MySQL
-
Networking: Virtual Network, Application Gateway, Front Door (CDN)
-
Monitoring: Azure Monitor, Log Analytics
Best Practices:
-
Use Azure AD for identity and access management
-
Implement Azure Policy for governance
-
Enable Azure Security Center for threat protection
-
Use resource groups for logical organization
Terraform Best Practices
Project Structure:
terraform/ ├── environments/ │ ├── dev/ │ │ ├── main.tf │ │ ├── variables.tf │ │ └── terraform.tfvars │ ├── staging/ │ └── prod/ ├── modules/ │ ├── vpc/ │ ├── eks/ │ └── rds/ └── global/ └── backend.tf
Code Organization:
-
Use modules for reusable infrastructure components
-
Separate environments with workspaces or directories
-
Store state remotely (S3 + DynamoDB for AWS, GCS for GCP, Azure Blob for Azure)
-
Use variables for environment-specific values
-
Never commit secrets (use AWS Secrets Manager, HashiCorp Vault, etc.)
Terraform Workflow:
Initialize
terraform init
Plan (review changes)
terraform plan -out=tfplan
Apply (execute changes)
terraform apply tfplan
Destroy (when needed)
terraform destroy
Best Practices:
-
Use terraform fmt for consistent formatting
-
Use terraform validate to check syntax
-
Implement state locking to prevent concurrent modifications
-
Use terraform import for existing resources
-
Version pin providers: required_version = "~> 1.5"
-
Use data sources for referencing existing resources
-
Implement depends_on for explicit resource dependencies
Kubernetes Deployment Patterns
Deployment Strategies:
-
Rolling Update: Gradual replacement of pods (default)
-
Blue/Green: Run two identical environments, switch traffic
-
Canary: Gradual traffic shift to new version
-
Recreate: Terminate old pods before creating new ones (downtime)
Resource Management:
apiVersion: apps/v1 kind: Deployment metadata: name: myapp spec: replicas: 3 selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: containers: - name: myapp image: myapp:v1.0.0 resources: requests: memory: '256Mi' cpu: '250m' limits: memory: '512Mi' cpu: '500m' livenessProbe: httpGet: path: /health port: 8080 readinessProbe: httpGet: path: /ready port: 8080
Best Practices:
-
Use namespaces for environment/team isolation
-
Implement RBAC for access control
-
Define resource requests and limits
-
Use liveness and readiness probes
-
Use ConfigMaps and Secrets for configuration
-
Implement Pod Security Policies (PSP) or Pod Security Standards (PSS)
-
Use Horizontal Pod Autoscaler (HPA) for auto-scaling
CI/CD Pipeline Patterns
GitHub Actions Example:
name: CI/CD Pipeline
on: push: branches: [main, develop] pull_request: branches: [main]
jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Run tests run: npm test
build: needs: test runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Build Docker image run: docker build -t myapp:${{ github.sha }} . - name: Push to registry run: docker push myapp:${{ github.sha }}
deploy: needs: build runs-on: ubuntu-latest if: github.ref == 'refs/heads/main' steps: - name: Deploy to Kubernetes run: kubectl set image deployment/myapp myapp=myapp:${{ github.sha }}
Best Practices:
-
Implement automated testing (unit, integration, e2e)
-
Use matrix builds for multi-platform testing
-
Cache dependencies to speed up builds
-
Use secrets management for sensitive data
-
Implement deployment gates and approvals for production
-
Use semantic versioning for releases
-
Implement rollback strategies
Infrastructure as Code (IaC) Principles
Version Control:
-
Store all infrastructure code in Git
-
Use pull requests for code review
-
Implement branch protection rules
-
Tag releases for production deployments
Testing:
-
Use terraform plan to preview changes
-
Implement policy-as-code with Sentinel, OPA, or Checkov
-
Use tflint for Terraform linting
-
Test modules in isolation
Documentation:
-
Document module inputs and outputs
-
Maintain README files for each module
-
Use terraform-docs to auto-generate documentation
Monitoring and Observability
The Three Pillars:
Metrics (Prometheus + Grafana)
-
Use Prometheus for metrics collection
-
Define SLIs (Service Level Indicators)
-
Set up alerting rules
-
Create Grafana dashboards for visualization
Logs (ELK Stack, CloudWatch, Cloud Logging)
-
Centralize logs from all services
-
Implement structured logging (JSON format)
-
Use log aggregation and parsing
-
Set up log-based alerts
Traces (Jaeger, Zipkin, X-Ray)
-
Implement distributed tracing
-
Track request flow across microservices
-
Identify performance bottlenecks
-
Correlate traces with logs and metrics
Observability Best Practices:
-
Define SLOs (Service Level Objectives) and SLAs
-
Implement health check endpoints
-
Use APM (Application Performance Monitoring) tools
-
Set up on-call rotations and runbooks
-
Practice incident response procedures
Container Orchestration (Kubernetes)
Helm Charts:
-
Use Helm for package management
-
Create reusable chart templates
-
Use values files for environment-specific configuration
-
Version and publish charts to chart repository
Kubernetes Operators:
-
Automate operational tasks
-
Manage complex stateful applications
-
Examples: Prometheus Operator, Postgres Operator
Service Mesh (Istio, Linkerd):
-
Implement traffic management (canary, blue/green)
-
Enable mutual TLS for service-to-service communication
-
Implement circuit breakers and retries
-
Observe traffic with distributed tracing
Cost Optimization
AWS Cost Optimization:
-
Use Reserved Instances or Savings Plans for predictable workloads
-
Implement auto-scaling to match demand
-
Use S3 lifecycle policies to transition to cheaper storage classes
-
Enable Cost Explorer and set up budgets
-
Right-size instances based on usage metrics
Multi-Cloud Cost Management:
-
Use tags/labels for cost allocation
-
Implement chargeback models for team accountability
-
Use spot/preemptible instances for non-critical workloads
-
Monitor unused resources (idle VMs, unattached volumes)
Cloudflare Developer Platform
Cloudflare Workers & Pages:
-
Edge computing platform for serverless functions
-
Deploy at the edge (close to users globally)
-
Use Workers KV for edge key-value storage
-
Use Durable Objects for stateful applications
Cloudflare Primitives:
-
R2: S3-compatible object storage (no egress fees)
-
D1: SQLite-based serverless database
-
KV: Key-value storage (globally distributed)
-
AI: Run AI inference at the edge
-
Queues: Message queuing service
-
Vectorize: Vector database for embeddings
Configuration (wrangler.toml):
name = "my-worker" main = "src/index.ts" compatibility_date = "2024-01-01"
[[kv_namespaces]] binding = "MY_KV" id = "xxx"
[[r2_buckets]] binding = "MY_BUCKET" bucket_name = "my-bucket"
[[d1_databases]] binding = "DB" database_name = "my-db" database_id = "xxx"
Consolidated Skills
This expert skill consolidates 1 individual skills:
- cloudflare-developer-tools-rule
Related Skills
- docker-compose
- Container orchestration and multi-container application management
Memory Protocol (MANDATORY)
Before starting:
cat .claude/context/memory/learnings.md
After completing: Record any new patterns or exceptions discovered.
ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.