kafka-iac-deployment

Kafka Infrastructure as Code (IaC) Deployment

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "kafka-iac-deployment" with this command: npx skills add anton-abyzov/specweave/anton-abyzov-specweave-kafka-iac-deployment

Kafka Infrastructure as Code (IaC) Deployment

Expert guidance for deploying Apache Kafka using Terraform across multiple platforms.

When to Use This Skill

I activate when you need help with:

  • Terraform deployments: "Deploy Kafka with Terraform", "provision Kafka cluster"

  • Platform selection: "Should I use AWS MSK or self-hosted Kafka?", "compare Kafka platforms"

  • Infrastructure planning: "How to size Kafka infrastructure", "Kafka on AWS vs Azure"

  • IaC automation: "Automate Kafka deployment", "CI/CD for Kafka infrastructure"

What I Know

Available Terraform Modules

This plugin provides 3 production-ready Terraform modules:

  1. Apache Kafka (Self-Hosted, KRaft Mode)
  • Location: plugins/specweave-kafka/terraform/apache-kafka/

  • Platform: AWS EC2 (can adapt to other clouds)

  • Architecture: KRaft mode (no ZooKeeper dependency)

  • Features:

  • Multi-broker cluster (3-5 brokers recommended)

  • Security groups with SASL_SSL

  • IAM roles for S3 backups

  • CloudWatch metrics and alarms

  • Auto-scaling group support

  • Custom VPC and subnet configuration

  • Use When:

  • ✅ You need full control over Kafka configuration

  • ✅ Running Kafka 3.6+ (KRaft mode)

  • ✅ Want to avoid ZooKeeper operational overhead

  • ✅ Multi-cloud or hybrid deployments

  • Variables: module "kafka" { source = "../../plugins/specweave-kafka/terraform/apache-kafka"

    environment = "production" broker_count = 3 kafka_version = "3.7.0" instance_type = "m5.xlarge" vpc_id = var.vpc_id subnet_ids = var.subnet_ids domain = "example.com" enable_s3_backups = true enable_monitoring = true }

  1. AWS MSK (Managed Streaming for Kafka)
  • Location: plugins/specweave-kafka/terraform/aws-msk/

  • Platform: AWS Managed Service

  • Features:

  • Fully managed Kafka service

  • IAM authentication + SASL/SCRAM

  • Auto-scaling (provisioned throughput)

  • Built-in monitoring (CloudWatch)

  • Multi-AZ deployment

  • Encryption in transit and at rest

  • Use When:

  • ✅ You want AWS to manage Kafka operations

  • ✅ Need tight AWS integration (IAM, KMS, CloudWatch)

  • ✅ Prefer operational simplicity over cost

  • ✅ Running in AWS VPC

  • Variables: module "msk" { source = "../../plugins/specweave-kafka/terraform/aws-msk"

    cluster_name = "my-kafka-cluster" kafka_version = "3.6.0" number_of_broker_nodes = 3 broker_node_instance_type = "kafka.m5.large"

    vpc_id = var.vpc_id subnet_ids = var.private_subnet_ids

    enable_iam_auth = true enable_scram_auth = false enable_auto_scaling = true }

  1. Azure Event Hubs (Kafka API)
  • Location: plugins/specweave-kafka/terraform/azure-event-hubs/

  • Platform: Azure Managed Service

  • Features:

  • Kafka 1.0+ protocol support

  • Auto-inflate (elastic scaling)

  • Premium SKU for high throughput

  • Zone redundancy

  • Private endpoints (VNet integration)

  • Event capture to Azure Storage

  • Use When:

  • ✅ Running on Azure cloud

  • ✅ Need Kafka-compatible API without Kafka operations

  • ✅ Want serverless scaling (auto-inflate)

  • ✅ Integrating with Azure ecosystem

  • Variables: module "event_hubs" { source = "../../plugins/specweave-kafka/terraform/azure-event-hubs"

    namespace_name = "my-event-hub-ns" resource_group_name = var.resource_group_name location = "eastus"

    sku = "Premium" capacity = 1 kafka_enabled = true auto_inflate_enabled = true maximum_throughput_units = 20 }

Platform Selection Decision Tree

Need Kafka deployment? START HERE:

├─ Running on AWS? │ ├─ YES → Want managed service? │ │ ├─ YES → Use AWS MSK module (terraform/aws-msk) │ │ └─ NO → Use Apache Kafka module (terraform/apache-kafka) │ └─ NO → Continue... │ ├─ Running on Azure? │ ├─ YES → Use Azure Event Hubs module (terraform/azure-event-hubs) │ └─ NO → Continue... │ ├─ Multi-cloud or hybrid? │ └─ YES → Use Apache Kafka module (most portable) │ ├─ Need maximum control? │ └─ YES → Use Apache Kafka module │ └─ Default → Use Apache Kafka module (self-hosted, KRaft mode)

Deployment Workflows

Workflow 1: Deploy Self-Hosted Kafka (Apache Kafka Module)

Scenario: You want full control over Kafka on AWS EC2

1. Create Terraform configuration

cat > main.tf <<EOF module "kafka_cluster" { source = "../../plugins/specweave-kafka/terraform/apache-kafka"

environment = "production" broker_count = 3 kafka_version = "3.7.0" instance_type = "m5.xlarge"

vpc_id = "vpc-12345678" subnet_ids = ["subnet-abc", "subnet-def", "subnet-ghi"] domain = "kafka.example.com"

enable_s3_backups = true enable_monitoring = true

tags = { Project = "MyApp" Environment = "Production" } }

output "broker_endpoints" { value = module.kafka_cluster.broker_endpoints } EOF

2. Initialize Terraform

terraform init

3. Plan deployment (review what will be created)

terraform plan

4. Apply (create infrastructure)

terraform apply

5. Get broker endpoints

terraform output broker_endpoints

Output: ["kafka-0.kafka.example.com:9093", "kafka-1.kafka.example.com:9093", ...]

Workflow 2: Deploy AWS MSK (Managed Service)

Scenario: You want AWS to manage Kafka operations

1. Create Terraform configuration

cat > main.tf <<EOF module "msk_cluster" { source = "../../plugins/specweave-kafka/terraform/aws-msk"

cluster_name = "my-msk-cluster" kafka_version = "3.6.0" number_of_broker_nodes = 3 broker_node_instance_type = "kafka.m5.large"

vpc_id = var.vpc_id subnet_ids = var.private_subnet_ids

enable_iam_auth = true enable_auto_scaling = true

tags = { Project = "MyApp" } }

output "bootstrap_brokers" { value = module.msk_cluster.bootstrap_brokers_sasl_iam } EOF

2. Deploy

terraform init && terraform apply

3. Configure IAM authentication

(module outputs IAM policy, attach to your application role)

Workflow 3: Deploy Azure Event Hubs (Kafka API)

Scenario: You're on Azure and want Kafka-compatible API

1. Create Terraform configuration

cat > main.tf <<EOF module "event_hubs" { source = "../../plugins/specweave-kafka/terraform/azure-event-hubs"

namespace_name = "my-kafka-namespace" resource_group_name = "my-resource-group" location = "eastus"

sku = "Premium" capacity = 1 kafka_enabled = true auto_inflate_enabled = true maximum_throughput_units = 20

Create hubs (topics) for your use case

hubs = [ { name = "user-events", partitions = 12 }, { name = "order-events", partitions = 6 }, { name = "payment-events", partitions = 3 } ] }

output "connection_string" { value = module.event_hubs.connection_string sensitive = true } EOF

2. Deploy

terraform init && terraform apply

3. Get connection details

terraform output connection_string

Infrastructure Sizing Recommendations

Small Environment (Dev/Test)

Self-hosted: 1 broker, m5.large

broker_count = 1 instance_type = "m5.large"

AWS MSK: 1 broker per AZ, kafka.m5.large

number_of_broker_nodes = 3 broker_node_instance_type = "kafka.m5.large"

Azure Event Hubs: Basic SKU

sku = "Basic" capacity = 1

Medium Environment (Staging/Production)

Self-hosted: 3 brokers, m5.xlarge

broker_count = 3 instance_type = "m5.xlarge"

AWS MSK: 3 brokers, kafka.m5.xlarge

number_of_broker_nodes = 3 broker_node_instance_type = "kafka.m5.xlarge"

Azure Event Hubs: Standard SKU with auto-inflate

sku = "Standard" capacity = 2 auto_inflate_enabled = true maximum_throughput_units = 10

Large Environment (High-Throughput Production)

Self-hosted: 5+ brokers, m5.2xlarge or m5.4xlarge

broker_count = 5 instance_type = "m5.2xlarge"

AWS MSK: 6+ brokers, kafka.m5.2xlarge, auto-scaling

number_of_broker_nodes = 6 broker_node_instance_type = "kafka.m5.2xlarge" enable_auto_scaling = true

Azure Event Hubs: Premium SKU with zone redundancy

sku = "Premium" capacity = 4 zone_redundant = true maximum_throughput_units = 20

Best Practices

Security Best Practices

Always use encryption in transit

  • Self-hosted: Enable SASL_SSL listener

  • AWS MSK: Set encryption_in_transit_client_broker = "TLS"

  • Azure Event Hubs: HTTPS/TLS enabled by default

Use IAM authentication (when possible)

  • AWS MSK: enable_iam_auth = true

  • Azure Event Hubs: Managed identities

Network isolation

  • Deploy in private subnets

  • Use security groups/NSGs restrictively

  • Azure: Enable private endpoints for Premium SKU

High Availability Best Practices

Multi-AZ deployment

  • Self-hosted: Distribute brokers across 3+ AZs

  • AWS MSK: Automatically multi-AZ

  • Azure Event Hubs: Enable zone_redundant = true (Premium)

Replication factor = 3

  • Self-hosted: default.replication.factor=3

  • AWS MSK: Configured automatically

  • Azure Event Hubs: N/A (fully managed)

min.insync.replicas = 2

  • Ensures durability even if 1 broker fails

Cost Optimization

Right-size instances

  • Use ClusterSizingCalculator utility (in kafka-architecture skill)

  • Start small, scale up based on metrics

Auto-scaling (where available)

  • AWS MSK: enable_auto_scaling = true

  • Azure Event Hubs: auto_inflate_enabled = true

Retention policies

  • Set log.retention.hours based on actual needs (default: 168 hours = 7 days)

  • Shorter retention = lower storage costs

Monitoring Integration

All modules integrate with monitoring:

Self-Hosted Kafka

  • CloudWatch metrics (via JMX Exporter)

  • Prometheus + Grafana dashboards (see kafka-observability skill)

  • Custom CloudWatch alarms

AWS MSK

  • Built-in CloudWatch metrics

  • Enhanced monitoring available

  • Integration with CloudWatch Alarms

Azure Event Hubs

  • Built-in Azure Monitor metrics

  • Diagnostic logs to Log Analytics

  • Integration with Azure Alerts

Troubleshooting

"Terraform destroy fails on security groups"

Cause: Resources using security groups still exist Fix:

1. Find dependent resources

aws ec2 describe-network-interfaces --filters "Name=group-id,Values=sg-12345678"

2. Delete dependent resources first

3. Retry terraform destroy

"AWS MSK cluster takes 20+ minutes to create"

Cause: MSK provisioning is inherently slow (AWS behavior) Fix: This is normal. Use --auto-approve for automation:

terraform apply -auto-approve

"Azure Event Hubs: Connection refused"

Cause: Kafka protocol not enabled OR incorrect connection string Fix:

  • Verify kafka_enabled = true in Terraform

  • Use Kafka connection string (not Event Hubs connection string)

  • Check firewall rules (Premium SKU supports private endpoints)

Integration with Other Skills

  • kafka-architecture: For cluster sizing and partitioning strategy

  • kafka-observability: For Prometheus + Grafana setup after deployment

  • kafka-kubernetes: For deploying Kafka on Kubernetes (alternative to Terraform)

  • kafka-cli-tools: For testing deployed clusters with kcat

Quick Reference Commands

Terraform workflow

terraform init # Initialize modules terraform plan # Preview changes terraform apply # Create infrastructure terraform output # Get outputs (endpoints, etc.) terraform destroy # Delete infrastructure

AWS MSK specific

aws kafka list-clusters # List MSK clusters aws kafka describe-cluster --cluster-arn <arn> # Get cluster details

Azure Event Hubs specific

az eventhubs namespace list # List namespaces az eventhubs eventhub list --namespace-name <name> --resource-group <rg> # List hubs

Next Steps After Deployment:

  • Use kafka-observability skill to set up Prometheus + Grafana monitoring

  • Use kafka-cli-tools skill to test cluster with kcat

  • Deploy your producer/consumer applications

  • Monitor cluster health and performance

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

github-issue-tracker

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

github-issue-standard

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

github-multi-project

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

kafka-cli-tools

No summary provided by upstream source.

Repository SourceNeeds Review