ln-811-performance-profiler

Runtime profiling with multi-metric measurement, instrumentation, and performance map generation

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "ln-811-performance-profiler" with this command: npx skills add levnikolaevich/claude-code-skills/levnikolaevich-claude-code-skills-ln-811-performance-profiler

Paths: File paths (shared/, references/, ../ln-*) are relative to skills repo root. If not found at CWD, locate this SKILL.md directory and go up one level for repo root.

ln-811-performance-profiler

Type: L3 Worker Category: 8XX Optimization

Runtime profiler that executes the optimization target, measures multiple metrics (CPU, memory, I/O, time), instruments code for per-function breakdown, and produces a standardized performance map from real data.


Overview

AspectDetails
InputProblem statement: target (file/endpoint/pipeline) + observed metric
OutputPerformance map (multi-metric, per-function), suspicion stack, bottleneck classification
PatternDiscover test → Baseline run → Static analysis → Deep profile → Performance map → Report

Workflow

Phases: Test Discovery → Baseline Run → Static Analysis → Deep Profile → Performance Map → Report


Phase 0: Test Discovery/Creation

MANDATORY READ: Load shared/references/ci_tool_detection.md for test framework detection. MANDATORY READ: Load shared/references/benchmark_generation.md for auto-generating benchmarks when none exist.

Find or create commands that exercise the optimization target. Two outputs: test_command (profiling/measurement) and e2e_test_command (functional safety gate).

Step 1: Discover test_command

PriorityMethodAction
1User-providedUser specifies test command or API endpoint
2Discover existing E2E testGrep test files for target entry point (stop at first match)
3Create test scriptGenerate per shared/references/benchmark_generation.md to .optimization/{slug}/profile_test.sh

E2E discovery protocol (stop at first match):

PriorityMethodHow
1Route-based searchGrep e2e/integration test files for entry point route
2Function-based searchGrep for entry point function name
3Module-based searchGrep for import of entry point module

Test creation (if no existing test found):

Target TypeGenerated Script
API endpointcurl -w "%{time_total}" -o /dev/null -s {endpoint}
FunctionStack-specific benchmark per shared/references/benchmark_generation.md
PipelineFull pipeline invocation with test input

Step 2: Discover e2e_test_command

If test_command came from E2E discovery (Step 1 priority 2): e2e_test_command = test_command.

Otherwise, run E2E discovery protocol again (same 3-priority table) to find a separate functional safety test.

If not found: e2e_test_command = null, log: WARNING: No e2e test covers {entry_point}. Full test suite serves as functional gate.

Output

FieldDescription
test_commandCommand for profiling/measurement
e2e_test_commandCommand for functional safety gate (may equal test_command, or null)
e2e_test_sourceDiscovery method: user / route / function / module / none

Phase 1: Baseline Run (Multi-Metric)

Run test_command with system-level profiling. Capture simultaneously:

MetricHow to CaptureWhen
Wall timetime wrapper or test harnessAlways
CPU time (user+sys)/usr/bin/time -v or language profilerAlways
Memory peak (RSS)/usr/bin/time -v (Max RSS) or tracemalloc / process.memoryUsage()Always
I/O bytes/usr/bin/time -v or structured logsIf I/O suspected
HTTP round-tripsCount from structured logs or application metricsIf network I/O in call graph
GPU utilizationnvidia-smi --query-gpuOnly if CUDA/GPU detected in stack

Baseline Protocol

ParameterValue
Runs3
MetricMedian
Warm-up1 discarded run
Outputbaseline — multi-metric snapshot

Phase 2: Static Analysis → Instrumentation Points

MANDATORY READ: Load bottleneck_classification.md

Trace call chain from code + build suspicion stack. Purpose: guide WHERE to instrument in Phase 3.

Step 1: Trace Call Chain

Starting from entry point, trace depth-first (max depth 5). At each step, READ the full function body.

Cross-service tracing: If service_topology is available from coordinator and a step makes an HTTP/gRPC call to another service whose code is accessible:

SituationAction
HTTP call to service with code in submodule/monorepoFollow into that service's handler: resolve route → trace handler code (depth resets to 0 for the new service)
HTTP call to service without accessible codeClassify as External, record latency estimate
gRPC/message queue to known serviceSame as HTTP — follow into handler if code accessible

Record service: "{service_name}" on each step to track which service owns it. The performance_map steps tree can span multiple services.

Depth-First Rule: If code of the called service is accessible — ALWAYS profile INSIDE. NEVER classify an accessible service as "External/slow" without profiling its internals. "Slow" is a symptom, not a diagnosis.

5 Whys for each bottleneck: Before reporting a bottleneck, chain "why?" until you reach config/architecture level:

  1. "What is slow?" → alignment service (5.9s) 2. "Why?" → 6 pairs × ~1s each 3. "Why ~1s per pair?" → O(n²) mwmf computation 4. "Why O(n²)?" → library default, not production config 5. "Why default?" → matching_methods not configured → root cause = config

Step 2: Classify & Suspicion Scan

For each step, classify by type (CPU, I/O-DB, I/O-Network, I/O-File, Architecture, External, Cache) and scan for performance concerns.

Suspicion checklist (minimum, not limitation):

CategoryWhat to Look For
Connection managementClient created per-request? Missing pooling? Missing reuse?
Data flowData read multiple times? Over-fetching? Unnecessary transforms?
Async patternsSync I/O in async context? Sequential awaits without data dependency?
Resource lifecycleUnclosed connections? Temp files? Memory accumulation in loop?
ConfigurationHardcoded timeouts? Default pool sizes? Missing batch size config?
Redundant workSame validation at multiple layers? Same data loaded twice?
ArchitectureN+1 in loop? Batch API unused? Cache infra unused? Sequential-when-parallel?
(open)Anything else spotted — checklist does not limit findings

Step 2b: Suspicion Deduplication

MANDATORY READ: Load shared/references/output_normalization.md

After generating suspicions across all call chain steps, normalize and deduplicate per §1-§2:

  • Normalize suspicion descriptions (replace specific values with placeholders)
  • Group identical suspicions across different steps → merge into single entry with affected_steps: [list]
  • Example: "Missing connection pooling" found in steps 1.1, 1.2, 1.3 → one suspicion with affected_steps: ["1.1", "1.2", "1.3"]

Step 3: Verify & Map to Instrumentation Points

FOR each suspicion:
  1. VERIFY: follow code to confirm or dismiss
  2. VERDICT: CONFIRMED → map to instrumentation point | DISMISSED → log reason
  3. For each CONFIRMED suspicion, identify:
     - function to wrap with timing
     - I/O call to count
     - memory allocation to track

Profiler Selection (per stack)

StackNon-invasive profilerInvasive (if non-invasive insufficient)
Pythonpy-spy, cProfiletime.perf_counter() decorators
Node.jsclinic, --profconsole.time() wrappers
Gopprof (built-in)Usually not needed
.NETdotnet-traceStopwatch wrappers
Rustcargo flamegraphstd::time::Instant

Stack detection: per shared/references/ci_tool_detection.md.


Phase 3: Deep Profile

Profiler Hierarchy (escalate as needed)

LevelTool ExamplesWhat It ShowsWhen to Use
1py-spy, cProfile, pprof, dotnet-traceFunction-level hotspotsAlways — first pass
2line_profiler, per-line timingLine-level timing in hotspot functionHotspot function found but cause unclear
3tracemalloc, memory_profilerPer-line memory allocationMemory metrics abnormal in baseline

Step 1: Non-Invasive Profiling (preferred)

Run test_command with Level 1 profiler to get per-function breakdown without code changes.

Step 2: Escalation Decision

After Level 1 profiler run, evaluate result against suspicion stack from Phase 2:

Profiler ResultAction
Hotspot function identified, time breakdown confirms suspicionsDONE — proceed to Phase 4
Hotspot identified but internal cause unclear (CPU vs I/O inside one function)Escalate to Level 2 (line-level timing)
Memory baseline abnormal (peak or delta)Escalate to Level 3 (memory profiler)
Multiple suspicions unresolved — profiler granularity insufficientGo to Step 3 (targeted instrumentation)
Profiler unavailable or overhead > 20% of wall timeGo to Step 3 (targeted instrumentation)

Step 3: Targeted Instrumentation (proactive)

Add timing/logging along the call stack at instrumentation points identified in Phase 2 Step 3:

1. FOR each CONFIRMED suspicion without measured data:
     Add timing wrapper around target function/I/O call
     Add counter for I/O round-trips if network/DB suspected
     (cross-service: instrument in the correct service's codebase)
2. Re-run test_command (3 runs, median)
3. Collect per-function measurements from logs
4. Record list of instrumented files (may span multiple services)
Instrumentation TypeWhenExample
Timing wrapperAlways for unresolved suspicionstime.perf_counter() around function call
I/O call counterNetwork or DB bottleneck suspectedCount HTTP requests, DB queries in loop
Memory snapshotMemory accumulation suspectedtracemalloc.get_traced_memory() before/after

KEEP instrumentation in place. The executor reuses it for post-optimization per-function comparison, then cleans up after strike. Report instrumented_files in output.


Phase 4: Build Performance Map

Standardized format — feeds into .optimization/{slug}/context.md for downstream consumption.

performance_map:
  test_command: "uv run pytest tests/e2e/test_example.py -s"
  baseline:
    wall_time_ms: 7280
    cpu_time_ms: 850
    memory_peak_mb: 256
    memory_delta_mb: 45
    io_read_bytes: 1200000
    io_write_bytes: 500000
    http_round_trips: 13
  steps:                          # service field present only in multi-service topology
    - id: "1"
      function: "process_job"
      location: "app/services/job_processor.py:45"
      service: "api"             # optional — which service owns this step
      wall_time_ms: 7200
      time_share_pct: 99
      type: "function_call"
      children:
        - id: "1.1"
          function: "translate_binary"
          wall_time_ms: 7100
          type: "function_call"
          children:
            - id: "1.1.1"
              function: "tikal_extract"
              service: "tikal"   # cross-service: code traced into submodule
              wall_time_ms: 2800
              type: "http_call"
              http_round_trips: 1
            - id: "1.1.2"
              function: "mt_translate"
              service: "mt-engine"
              wall_time_ms: 3500
              type: "http_call"
              http_round_trips: 13
  bottleneck_classification: "I/O-Network"
  bottleneck_detail: "13 sequential HTTP calls to MT service (3500ms)"
  top_bottlenecks:
    - step: "1.1.2", type: "I/O-Network", share: 48%
    - step: "1.1.1", type: "I/O-Network", share: 38%

Phase 5: Report

Report Structure

profile_result:
  entry_point_info:
    type: <string>                     # "api_endpoint" | "function" | "pipeline"
    location: <string>                 # file:line
    route: <string|null>               # API route (if endpoint)
    function: <string>                 # Entry point function name
  performance_map: <object>            # Full map from Phase 4
  bottleneck_classification: <string>  # Primary bottleneck type
  bottleneck_detail: <string>          # Human-readable description
  top_bottlenecks:
    - step, type, share, description
  optimization_hints:                  # CONFIRMED suspicions only (Phase 2)
    - hint with evidence
  suspicion_stack:                     # Full audit trail (confirmed + dismissed)
    - category: <string>
      location: <string>
      description: <string>
      verdict: <string>               # "confirmed" | "dismissed"
      evidence: <string>
      verification_note: <string>
  e2e_test:
    command: <string|null>             # E2E safety test command (from Phase 0)
    source: <string>                   # user / route / function / module / none
  instrumented_files: [<string>]       # Files with active instrumentation (empty if non-invasive only)
  wrong_tool_indicators: []            # Empty = proceed, non-empty = exit

Wrong Tool Indicators

IndicatorCondition
external_service_no_alternative90%+ measured time in external service, no batch/cache/parallel path
within_industry_normMeasured time within expected range for operation type
infrastructure_boundBottleneck is hardware (measured via system metrics)
already_optimizedCode already uses best patterns (confirmed by suspicion scan)

Error Handling

ErrorRecovery
Cannot resolve entry pointBlock: "file/function not found at {path}"
Test command fails on unmodified codeBlock: "test fails before profiling — fix test first"
Profiler not available for stackFall back to invasive instrumentation (Phase 3 Step 2)
Instrumentation breaks testsRevert immediately: git checkout -- .
Call chain too deep (> 5 levels)Stop at depth 5, note truncation
Cannot classify step typeDefault to "Unknown", use measured time
No I/O detected (pure CPU)Classify as CPU, focus on algorithm profiling

References

  • bottleneck_classification.md — classification taxonomy
  • latency_estimation.md — latency heuristics (fallback for static-only mode)
  • shared/references/ci_tool_detection.md — stack/tool detection
  • shared/references/benchmark_generation.md — benchmark templates per stack

Definition of Done

  • Test command discovered or created for optimization target
  • E2E safety test discovered (or documented as unavailable)
  • Baseline measured: wall time, CPU, memory (3 runs, median)
  • Call graph traced and function bodies read
  • Suspicion stack built: each suspicion verified and mapped to instrumentation point
  • Deep profile completed (non-invasive preferred, invasive if needed)
  • Instrumented files reported (cleanup deferred to executor)
  • Performance map built in standardized format (real measurements)
  • Top 3 bottlenecks identified from measured data
  • Wrong tool indicators evaluated from real metrics
  • optimization_hints contain only CONFIRMED suspicions with measurement evidence
  • Report returned to coordinator

Version: 3.0.0 Last Updated: 2026-03-15

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

ln-782-test-runner

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

ln-140-test-docs-creator

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

ln-150-presentation-creator

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

ln-110-project-docs-coordinator

No summary provided by upstream source.

Repository SourceNeeds Review