senior-cloud-architect

Senior Cloud Architect

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "senior-cloud-architect" with this command: npx skills add borghei/claude-skills/borghei-claude-skills-senior-cloud-architect

Senior Cloud Architect

Expert cloud architecture and infrastructure design across AWS, GCP, and Azure.

Keywords

cloud, aws, gcp, azure, terraform, infrastructure, vpc, eks, ecs, lambda, cost-optimization, disaster-recovery, multi-region, iam, security, migration

Quick Start

Analyze infrastructure costs

python scripts/cost_analyzer.py --account production --period monthly

Run DR validation

python scripts/dr_test.py --region us-west-2 --type failover

Audit security posture

python scripts/security_audit.py --framework cis --output report.html

Generate resource inventory

python scripts/inventory.py --accounts all --format csv

Tools

Script Purpose

scripts/cost_analyzer.py

Analyze cloud spend by service, environment, and tag

scripts/dr_test.py

Validate disaster recovery failover procedures

scripts/security_audit.py

Audit against CIS benchmarks and compliance frameworks

scripts/inventory.py

Inventory all resources across accounts and regions

Cloud Platform Comparison

Service AWS GCP Azure

Compute EC2, ECS, EKS GCE, GKE VMs, AKS

Serverless Lambda Cloud Functions Azure Functions

Storage S3 Cloud Storage Blob Storage

Database RDS, DynamoDB Cloud SQL, Spanner SQL DB, CosmosDB

ML SageMaker Vertex AI Azure ML

CDN CloudFront Cloud CDN Azure CDN

Workflow 1: Design a Production AWS Architecture

  • Define requirements -- Identify compute, storage, database, and networking needs. Determine RTO/RPO targets.

  • Provision VPC with Terraform: module "vpc" { source = "terraform-aws-modules/vpc/aws" version = "~> 5.0" name = "${var.project}-${var.environment}" cidr = var.vpc_cidr azs = ["${var.region}a", "${var.region}b", "${var.region}c"] private_subnets = var.private_subnets public_subnets = var.public_subnets enable_nat_gateway = true single_nat_gateway = var.environment != "production" enable_dns_hostnames = true tags = local.common_tags }

  • Deploy compute -- ECS/EKS in private subnets behind an ALB in public subnets. Use at least 2 AZs for redundancy.

  • Configure database -- RDS Multi-AZ for production, single-AZ for staging. Set backup retention to 30 days (production) or 7 days (non-production).

  • Add caching layer -- ElastiCache (Redis) between application and database.

  • Layer security -- WAF on CloudFront, NACLs on subnets, security groups on instances. Apply least-privilege IAM.

  • Validate -- Run python scripts/security_audit.py --framework cis and resolve all high-severity findings.

Reference Architecture

Route 53 (DNS) -> CloudFront + WAF -> ALB -> ECS/EKS Cluster (AZ-a) + ECS/EKS Cluster (AZ-b) -> ElastiCache (Redis) -> RDS Multi-AZ (Primary + Standby)

Workflow 2: Optimize Cloud Costs

  • Audit current spend -- python scripts/cost_analyzer.py --account production --period monthly

  • Right-size instances -- Identify instances with avg CPU <10% and max CPU <30% as downsize candidates:

Pseudocode for right-sizing logic

if avg_cpu < 10 and max_cpu < 30: recommendation = 'downsize' elif avg_cpu > 80: recommendation = 'upsize' else: recommendation = 'optimal'

  • Convert steady-state workloads to Reserved Instances or Savings Plans:

Type Discount Commitment Use Case

On-Demand 0% None Variable workloads

Reserved 30-72% 1-3 years Steady-state

Savings Plans 30-72% 1-3 years Flexible compute

Spot 60-90% None Fault-tolerant batch

  • Enforce cost allocation tags -- Require Environment , Project , Owner , CostCenter on all resources. Alert on untagged resources after 24 hours.

  • Validate -- Re-run cost analyzer and confirm savings target achieved.

Workflow 3: Plan Disaster Recovery

  • Select DR strategy based on RTO/RPO requirements:

Strategy RTO RPO Cost

Backup & Restore Hours Hours $

Pilot Light Minutes Minutes $$

Warm Standby Minutes Seconds $$$

Multi-Site Active Seconds Near-zero $$$$

  • Configure cross-region replication -- Database replication to secondary region. S3 cross-region replication for object storage.

  • Set up Route 53 failover routing -- Health checks on primary. Automatic DNS failover to secondary.

  • Define backup policy:

  • Database: continuous replication, 35-day retention, cross-region, encrypted

  • Application data: daily, 90-day retention, lifecycle to IA at 30d, Glacier at 90d

  • Configuration: on-change via git + S3, unlimited retention

  • Test -- python scripts/dr_test.py --region us-west-2 --type failover and confirm RTO/RPO targets met.

Workflow 4: Audit Security Posture

  • Run audit -- python scripts/security_audit.py --framework cis --output report.html

  • Review network segmentation -- Public subnets contain only NAT GW, ALB, bastion. Private subnets contain application tier. Data subnets contain RDS, Redis, Elasticsearch.

  • Enforce least-privilege IAM -- Every policy scoped to specific resources and conditions: { "Effect": "Allow", "Action": ["s3:GetObject", "s3:PutObject"], "Resource": "arn:aws:s3:::my-bucket/uploads/*", "Condition": { "StringEquals": { "aws:PrincipalTag/Team": "engineering" }, "IpAddress": { "aws:SourceIp": ["10.0.0.0/8"] } } }

  • Verify encryption -- Data encrypted at rest (KMS) and in transit (TLS 1.2+).

  • Validate -- Re-run audit and confirm all critical and high findings resolved.

AWS Well-Architected Pillars (Decision Checklist)

  • Operational Excellence: IaC everywhere? Monitoring and alerting? Runbooks for incidents?

  • Security: Least-privilege IAM? Encryption at rest and in transit? VPC segmentation?

  • Reliability: Multi-AZ? Auto-scaling? DR tested?

  • Performance: Right-sized instances? Caching layer? CDN for static assets?

  • Cost Optimization: Reserved capacity for steady-state? Spot for batch? Unused resources cleaned?

  • Sustainability: Efficient regions? Right-sized compute? Data lifecycle policies?

Reference Materials

Document Path

AWS Patterns references/aws_patterns.md

GCP Patterns references/gcp_patterns.md

Multi-Cloud Strategies references/multi_cloud.md

Cost Optimization Guide references/cost_optimization.md

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

product-designer

No summary provided by upstream source.

Repository SourceNeeds Review
2.2K-borghei
General

business-intelligence

No summary provided by upstream source.

Repository SourceNeeds Review
General

brand-strategist

No summary provided by upstream source.

Repository SourceNeeds Review
General

senior-mobile

No summary provided by upstream source.

Repository SourceNeeds Review