distributed-systems-basics

Distributed-systems workflow for failure-mode analysis, consistency choices, and reliability primitive selection across networked components. Use when correctness depends on partitions, retries, timeouts, ordering, or partial failures; do not use for single-process implementation details only.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "distributed-systems-basics" with this command: npx skills add kentoshimizu/sw-agent-skills/kentoshimizu-sw-agent-skills-distributed-systems-basics

Distributed Systems Basics

Overview

Use this skill to reason about correctness and reliability in systems where network faults and partial failures are normal.

Scope Boundaries

  • Multi-service workflows require explicit consistency and ordering guarantees.
  • Retry/timeout/duplicate-message behavior can change business correctness.
  • Teams need to define reliability primitives before implementation or rollout.

Shared References

  • Failure mode and consistency rules:
    • references/failure-mode-consistency-rules.md

Templates And Assets

  • Distributed flow risk template:
    • assets/distributed-flow-risk-template.md

Inputs To Gather

  • Component boundaries and communication patterns.
  • Consistency and ordering requirements per workflow.
  • Failure scenarios (partition, timeout, duplicate, out-of-order, stale read).
  • Recovery and observability capabilities.

Deliverables

  • Failure-mode map and risk ranking.
  • Consistency decision record per critical flow.
  • Reliability mechanism selection (retry, idempotency, backoff, timeout).
  • Validation plan (fault injection and invariant checks).

Workflow

  1. Capture critical flows with assets/distributed-flow-risk-template.md.
  2. Map failure assumptions and consistency requirements per flow.
  3. Select reliability primitives using references/failure-mode-consistency-rules.md.
  4. Define observability and recovery behavior.
  5. Validate assumptions with targeted failure tests and invariant checks.

Quality Standard

  • Critical flows have explicit consistency and ordering rules.
  • Retry/timeout semantics are bounded and intentional.
  • Idempotency strategy exists where at-least-once delivery is possible.
  • Failure handling is observable and testable.

Failure Conditions

  • Stop when consistency assumptions are implicit or contradictory.
  • Stop when retries/timeouts can amplify failure unboundedly.
  • Escalate when critical failure modes have no mitigation path.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

risk-requirements-analysis

No summary provided by upstream source.

Repository SourceNeeds Review
Research

ux-research-synthesis

No summary provided by upstream source.

Repository SourceNeeds Review
Research

algorithm-complexity-analysis

No summary provided by upstream source.

Repository SourceNeeds Review
Research

architecture-tradeoff-analysis

No summary provided by upstream source.

Repository SourceNeeds Review