using-dynamic-architectures

Dynamic Architectures Meta-Skill

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "using-dynamic-architectures" with this command: npx skills add tachyon-beep/skillpacks/tachyon-beep-skillpacks-using-dynamic-architectures

Dynamic Architectures Meta-Skill

When to Use This Skill

Invoke this meta-skill when you encounter:

  • Growing Networks: Adding capacity during training (new layers, neurons, modules)

  • Pruning Networks: Removing capacity that isn't contributing

  • Continual Learning: Training on new tasks without forgetting old ones

  • Gradient Isolation: Training new modules without destabilizing existing weights

  • Modular Composition: Building networks from graftable, composable components

  • Lifecycle Management: State machines controlling when to grow, train, integrate, prune

  • Progressive Training: Staged capability expansion with warmup and cooldown

This is the entry point for dynamic/morphogenetic neural network patterns. It routes to 7 specialized reference sheets.

How to Access Reference Sheets

IMPORTANT: All reference sheets are located in the SAME DIRECTORY as this SKILL.md file.

When this skill is loaded from: skills/using-dynamic-architectures/SKILL.md

Reference sheets like continual-learning-foundations.md are at: skills/using-dynamic-architectures/continual-learning-foundations.md

NOT at: skills/continual-learning-foundations.md (WRONG PATH)

Core Principle

Dynamic architectures grow capability, not just tune weights.

Static networks are a guess about capacity. Dynamic networks let training signal drive structure. The challenge is growing without forgetting, integrating without destabilizing, and knowing when to act.

Key tensions:

  • Stability vs. Plasticity: Preserve existing knowledge while adding new capacity

  • Isolation vs. Integration: Train new modules separately, then merge carefully

  • Exploration vs. Exploitation: When to add capacity vs. when to stabilize

The 7 Dynamic Architecture Skills

  • continual-learning-foundations - EWC, PackNet, rehearsal strategies, catastrophic forgetting theory

  • gradient-isolation-techniques - Freezing, gradient masking, stop_grad patterns, alpha blending

  • peft-adapter-techniques - LoRA, QLoRA, DoRA, adapter placement, merging strategies

  • dynamic-architecture-patterns - Grow/prune patterns, slot-based expansion, capacity scheduling

  • modular-neural-composition - MoE, gating, grafting semantics, interface contracts

  • ml-lifecycle-orchestration - State machines, quality gates, transition triggers, controllers

  • progressive-training-strategies - Staged expansion, warmup/cooldown, knowledge transfer

Routing Decision Framework

Step 1: Identify the Core Problem

Diagnostic Questions:

  • "Are you trying to prevent forgetting when training on new data/tasks?"

  • "Are you trying to add new capacity to an existing trained network?"

  • "Are you designing how multiple modules combine?"

  • "Are you deciding WHEN to grow, prune, or integrate?"

Quick Routing:

Problem Primary Skill

"Model forgets old tasks when I train new ones" continual-learning-foundations

"New module destabilizes existing weights" gradient-isolation-techniques

"Fine-tune LLM efficiently without full training" peft-adapter-techniques

"When should I add more capacity?" dynamic-architecture-patterns

"How do module outputs combine?" modular-neural-composition

"How do I manage the grow/train/integrate cycle?" ml-lifecycle-orchestration

"How do I warm up new modules safely?" progressive-training-strategies

Step 2: Catastrophic Forgetting (Continual Learning)

Symptoms:

  • Performance on old tasks drops when training on new tasks

  • Model "forgets" previous capabilities

  • Fine-tuning overwrites learned features

Route to: continual-learning-foundations.md

Covers:

  • Why SGD causes forgetting (loss landscape geometry)

  • EWC, SI, MAS (regularization approaches)

  • Progressive Neural Networks, PackNet (architectural approaches)

  • Experience replay, generative replay (rehearsal approaches)

  • Measuring forgetting (backward/forward transfer)

When to Use:

  • Training sequentially on multiple tasks

  • Fine-tuning without forgetting base capabilities

  • Designing systems that accumulate knowledge over time

Step 3: Gradient Isolation

Symptoms:

  • New module training affects host network stability

  • Want to train on host errors without backprop flowing to host

  • Need gradual integration of new capacity

Route to: gradient-isolation-techniques.md

Covers:

  • Freezing strategies (full, partial, scheduled)

  • detach() vs no_grad() semantics

  • Dual-path training (residual learning on errors)

  • Alpha blending for gradual integration

  • Hook-based gradient surgery

When to Use:

  • Training "seed" modules that learn from host errors

  • Preventing catastrophic interference during growth

  • Implementing safe module grafting

Step 4: PEFT Adapters (LoRA, QLoRA)

Symptoms:

  • Want to fine-tune large pretrained models efficiently

  • Memory constraints prevent full fine-tuning

  • Need task-specific adaptation without modifying base weights

Route to: peft-adapter-techniques.md

Covers:

  • LoRA (low-rank adaptation) fundamentals

  • QLoRA (quantized base + LoRA adapters)

  • DoRA (weight-decomposed adaptation)

  • Adapter placement strategies

  • Merging adapters into base model

  • Multiple adapter management

When to Use:

  • Fine-tuning LLMs on limited compute

  • Creating task-specific model variants

  • Memory-efficient adaptation of large models

Step 5: Dynamic Architecture Patterns

Symptoms:

  • Need to add capacity during training (not just before)

  • Want to prune underperforming components

  • Deciding when/where to grow the network

Route to: dynamic-architecture-patterns.md

Covers:

  • Growth patterns (slot-based, layer widening, depth extension)

  • Pruning patterns (magnitude, gradient-based, lottery ticket)

  • Trigger conditions (loss plateau, contribution metrics, budgets)

  • Capacity scheduling (grow-as-needed vs overparameterize-then-prune)

When to Use:

  • Building networks that expand during training

  • Implementing neural architecture search lite

  • Managing parameter budgets with dynamic allocation

Step 6: Modular Composition

Symptoms:

  • Combining outputs from multiple modules

  • Designing gating/routing mechanisms

  • Need graftable, replaceable components

Route to: modular-neural-composition.md

Covers:

  • Combination mechanisms (additive, multiplicative, selective)

  • Mixture of Experts (sparse gating, load balancing)

  • Grafting semantics (input/output attachment points)

  • Interface contracts (shape matching, normalization boundaries)

  • Multi-module coordination (independent, competitive, cooperative)

When to Use:

  • Building modular architectures with interchangeable parts

  • Implementing MoE or gated architectures

  • Designing residual streams as module communication

Step 7: Lifecycle Orchestration

Symptoms:

  • Need to decide WHEN to grow, train, integrate, prune

  • Building state machines for module lifecycle

  • Want quality gates before integration decisions

Route to: ml-lifecycle-orchestration.md

Covers:

  • State machine fundamentals (states, transitions, terminals)

  • Gate design patterns (structural, performance, stability, contribution)

  • Transition triggers (metric-based, time-based, budget-based)

  • Rollback and recovery (cooldown, hysteresis)

  • Controller patterns (heuristic, learned/RL, hybrid)

When to Use:

  • Designing grow/train/integrate/prune workflows

  • Implementing quality gates for safe integration

  • Building RL-controlled architecture decisions

Step 8: Progressive Training

Symptoms:

  • New modules cause instability when integrated

  • Need warmup/cooldown for safe capacity addition

  • Planning multi-stage training schedules

Route to: progressive-training-strategies.md

Covers:

  • Staged capacity expansion strategies

  • Warmup patterns (zero-init, LR warmup, alpha ramp)

  • Cooldown and stabilization (settling periods, consolidation)

  • Multi-stage schedules (sequential, overlapping, budget-aware)

  • Knowledge transfer between stages (inheritance, distillation)

When to Use:

  • Ramping new modules safely into production

  • Designing curriculum over architecture (not just data)

  • Preventing stage transition shock

Common Multi-Skill Scenarios

Scenario: Building a Morphogenetic System

Need: Network that grows seeds, trains them in isolation, and grafts successful ones

Routing sequence:

  • dynamic-architecture-patterns - Slot-based expansion, where seeds attach

  • gradient-isolation-techniques - Train seeds on host errors without destabilizing host

  • modular-neural-composition - How seed outputs blend into host stream

  • ml-lifecycle-orchestration - State machine for seed lifecycle

  • progressive-training-strategies - Warmup/cooldown for grafting

Scenario: Continual Learning Without Forgetting

Need: Train on sequence of tasks without catastrophic forgetting

Routing sequence:

  • continual-learning-foundations - Understand forgetting, choose approach

  • gradient-isolation-techniques - If using architectural approach (columns, modules)

  • progressive-training-strategies - Staged training across tasks

Scenario: Neural Architecture Search (Lite)

Need: Grow/prune network based on training signal

Routing sequence:

  • dynamic-architecture-patterns - Growth/pruning triggers and patterns

  • ml-lifecycle-orchestration - Automation via heuristics or RL

  • progressive-training-strategies - Stabilization between changes

Scenario: RL-Controlled Architecture

Need: RL agent deciding when to grow, prune, integrate

Routing sequence:

  • ml-lifecycle-orchestration - Learned controller patterns

  • dynamic-architecture-patterns - What actions the RL agent can take

  • gradient-isolation-techniques - Safe exploration during training

Rationalization Resistance Table

Rationalization Reality Counter-Guidance

"Just train a bigger model from scratch" Transfer + growth often beats from-scratch "Check continual-learning-foundations for why"

"I'll freeze everything except the new layer" Full freeze may be too restrictive "Check gradient-isolation-techniques for partial strategies"

"I'll add capacity whenever loss plateaus" Need more than loss plateau (contribution check) "Check ml-lifecycle-orchestration for proper gates"

"Modules can just sum their outputs" Naive summation can cause interference "Check modular-neural-composition for combination mechanisms"

"I'll integrate immediately when training finishes" Need warmup/holding period "Check progressive-training-strategies for safe integration"

"EWC solves all forgetting problems" EWC has limitations, may need architectural approach "Check continual-learning-foundations for trade-offs"

Red Flags Checklist

Watch for these signs of incorrect approach:

  • No Isolation: Training new modules without gradient isolation from host

  • No Warmup: Integrating new capacity at full amplitude immediately

  • No Gates: Integrating based only on time, not performance metrics

  • Naive Combination: Summing module outputs without gating or blending

  • Ignoring Forgetting: Adding new tasks without measuring old task performance

  • No Rollback: No plan for what happens if integration fails

Relationship to Other Packs

Request Primary Pack Why

"Implement PPO for architecture decisions" yzmir-deep-rl RL algorithm implementation

"Evaluate architecture changes without mutation" yzmir-deep-rl/counterfactual-reasoning Counterfactual simulation

"Debug PyTorch gradient flow" yzmir-pytorch-engineering Low-level PyTorch debugging

"Optimize training loop performance" yzmir-training-optimization General training optimization

"Design transformer architecture" yzmir-neural-architectures Static architecture design

"Deploy morphogenetic model" yzmir-ml-production Production deployment

Intersection with deep-rl: If using RL to control architecture decisions (when to grow/prune), combine this pack's lifecycle orchestration with deep-rl's policy gradient or actor-critic methods.

Counterfactual evaluation: Before committing to a live mutation (grow/prune), use deep-rl's counterfactual-reasoning.md to simulate the change and evaluate outcomes without risk. This is critical for production morphogenetic systems.

Diagnostic Question Templates

Use these to route users:

Problem Classification

  • "Are you training on multiple tasks sequentially, or growing a single-task network?"

  • "Do you have an existing trained model you want to extend, or starting fresh?"

  • "Is the issue forgetting (old performance drops) or instability (training explodes)?"

Architectural Questions

  • "Where do new modules attach to the existing network?"

  • "How should new module outputs combine with existing outputs?"

  • "What triggers growth? Loss plateau, manual, or learned?"

Lifecycle Questions

  • "What states can a module be in? (training, integrating, permanent, removed)"

  • "What conditions must be met before integration?"

  • "What happens if a module fails to improve performance?"

Summary: Routing Decision Tree

START: Dynamic architecture problem

├─ Forgetting old tasks? │ └─ → continual-learning-foundations

├─ New module destabilizes existing? │ └─ → gradient-isolation-techniques

├─ Fine-tuning LLM efficiently? │ └─ → peft-adapter-techniques

├─ When/where to add capacity? │ └─ → dynamic-architecture-patterns

├─ How modules combine? │ └─ → modular-neural-composition

├─ Managing grow/train/integrate cycle? │ └─ → ml-lifecycle-orchestration

├─ Warmup/cooldown for new capacity? │ └─ → progressive-training-strategies

└─ Building complete morphogenetic system? └─ → Start with dynamic-architecture-patterns → Then gradient-isolation-techniques → Then ml-lifecycle-orchestration

Reference Sheets

After routing, load the appropriate reference sheet:

  • continual-learning-foundations.md - EWC, PackNet, rehearsal, forgetting theory

  • gradient-isolation-techniques.md - Freezing, detach, alpha blending, hook surgery

  • peft-adapter-techniques.md - LoRA, QLoRA, DoRA, adapter merging

  • dynamic-architecture-patterns.md - Grow/prune patterns, triggers, scheduling

  • modular-neural-composition.md - MoE, gating, grafting, interface contracts

  • ml-lifecycle-orchestration.md - State machines, gates, controllers

  • progressive-training-strategies.md - Staged expansion, warmup/cooldown

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

design-and-build

No summary provided by upstream source.

Repository SourceNeeds Review
General

using-simulation-tactics

No summary provided by upstream source.

Repository SourceNeeds Review
General

using-quality-engineering

No summary provided by upstream source.

Repository SourceNeeds Review
General

using-simulation-foundations

No summary provided by upstream source.

Repository SourceNeeds Review