building-with-llms

Produce an LLM Build Pack (prompt+tool contract, data/eval plan, architecture+safety, launch checklist). Use for building with LLMs, GPT/Claude apps, prompt engineering, RAG, and tool-using agents. Category: AI & Technology.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "building-with-llms" with this command: npx skills add liqiongyu/lenny_skills_plus/liqiongyu-lenny-skills-plus-building-with-llms

Building with LLMs

Scope

Covers

  • Building and shipping LLM-powered features/apps (assistant, copilot, light agent workflows)
  • Prompt + tool contract design (instructions, schemas, examples, guardrails)
  • Data quality + evaluation (test sets, rubrics, red teaming, iteration loop)
  • Production readiness (latency/cost budgets, logging, fallbacks, safety/security checks)
  • Using coding agents (Codex/Claude Code) to accelerate engineering safely

When to use

  • “Turn this LLM feature idea into a build plan with prompts, evals, and launch checks.”
  • “We need a system prompt + tool definitions + output schema for our LLM workflow.”
  • “Our LLM is flaky—design an eval plan and iteration loop to stabilize quality.”
  • “Design a RAG/tool-using agent approach with safety and monitoring.”
  • “We want to use an AI coding agent to implement this—set constraints and review gates.”

When NOT to use

  • You need product/portfolio strategy and positioning (use ai-product-strategy).
  • You need a full PRD/spec set for cross-functional alignment (use writing-prds / writing-specs-designs).
  • You need primary user research (use conducting-user-interviews / usability-testing).
  • You are doing model training/research, infra architecture, or bespoke model tuning (delegate to ML/eng; this skill assumes API models).
  • You only want “which model/provider should we pick?” (treat as an input; if it dominates, do a separate evaluation doc).

Inputs

Minimum required

  • Use case + target user + what “good” looks like (success metrics + failure modes)
  • The LLM’s job: generate text, transform data, classify, extract, plan, or take actions via tools
  • Constraints: privacy/compliance, data sensitivity, latency, cost, reliability, supported regions
  • Integration surface: UI/workflow, downstream systems/APIs/tools, and any required output schema

Missing-info strategy

  • Ask up to 5 questions from references/INTAKE.md (3–5 at a time).
  • If details remain missing, proceed with explicit assumptions and provide 2–3 options (prompting vs RAG vs tool use; autonomy level).
  • If asked to write code or run commands, request confirmation and use least privilege (no secrets; avoid destructive changes).

Outputs (deliverables)

Produce an LLM Build Pack (in chat; or as files if requested), in this order:

  1. Feature brief (goal, users, non-goals, constraints, success + guardrails)
  2. System design sketch (pattern + architecture, context strategy, budgets, failure handling)
  3. Prompt + tool contract (system prompt, tool schemas, output schema, examples, refusal/guardrails)
  4. Data + evaluation plan (test set, rubrics, automated checks, red-team suite, acceptance thresholds)
  5. Build + iteration plan (prototype slice, instrumentation, debugging loop, how to use coding agents safely)
  6. Launch + monitoring plan (logging, dashboards/alerts, fallback/rollback, incident playbook hooks)
  7. Risks / Open questions / Next steps (always included)

Templates: references/TEMPLATES.md

Workflow (8 steps)

1) Frame the job, boundary, and “good”

  • Inputs: Use case, target user, constraints.
  • Actions: Write a crisp job statement (“The LLM must…”) + 3–5 non-goals. Define success metrics and guardrails (quality, safety, cost, latency).
  • Outputs: Draft Feature brief.
  • Checks: A stakeholder can restate what the LLM does and does not do, and how success is measured.

2) Choose the minimum viable autonomy pattern

  • Inputs: Workflow + risk tolerance.
  • Actions: Decide assistant vs copilot vs agent-like tool use. Identify “human control points” (review/approve moments) and what the model is never allowed to do.
  • Outputs: Autonomy decisions captured in Feature brief.
  • Checks: Any action-taking behavior has explicit permissions, confirmations, and an undo/rollback story.

3) Design the context strategy (prompting → RAG → tools)

  • Inputs: Data sources, integration points, constraints.
  • Actions: Decide how the model gets reliable context: instruction hierarchy, retrieval strategy, tool calls, structured inputs. Define the “source of truth” and how conflicts are handled.
  • Outputs: Draft System design sketch.
  • Checks: You can explain (a) what data is used, (b) where it comes from, (c) how freshness/authority is enforced.

4) Draft the prompt + tool contract (make the system legible)

  • Inputs: Job statement + context strategy + output schema needs.
  • Actions: Write the system prompt, tool descriptions, and output schema. Add examples and explicit DO/DO NOT rules. Include safe failure behavior (ask clarifying questions, abstain, cite sources).
  • Outputs: Prompt + tool contract.
  • Checks: A reviewer can predict behavior for 5–10 representative inputs; contract includes at least 3 hard constraints and examples.

5) Build the eval set + rubric (debug like software)

  • Inputs: Expected behaviors + failure modes + edge cases.
  • Actions: Create a test set covering normal cases, tricky cases, and red-team cases. Define a scoring rubric and acceptance thresholds. Add automated checks where possible (schema validity, citation presence, forbidden content).
  • Outputs: Data + evaluation plan.
  • Checks: You can run the same prompts repeatedly and measure improvement/regression; evals cover the top failure modes.

6) Prototype a thin slice, using coding agents safely

  • Inputs: System sketch + prompt contract + eval plan.
  • Actions: Implement the smallest end-to-end slice. Use coding agents for “lower hanging fruit” tasks, but keep tight constraints: small diffs, tests, code review, no secret handling.
  • Outputs: Build + iteration plan (and optionally a prototype plan/checklist).
  • Checks: You can explain what the agent changed, why, and how it was validated (tests, evals, manual review).

7) Production readiness: budgets, monitoring, and failure handling

  • Inputs: Prototype learnings + constraints.
  • Actions: Define cost/latency budgets, fallbacks, rate limits, logging fields, and alert thresholds. Address prompt injection/tool misuse risks; add safeguards and review processes.
  • Outputs: Launch + monitoring plan.
  • Checks: There is a clear path to detect regressions, cap cost, and safely degrade when the model misbehaves.

8) Quality gate + finalize

  • Inputs: Full draft pack.
  • Actions: Run references/CHECKLISTS.md and score with references/RUBRIC.md. Tighten unclear contracts, add missing tests, and always include Risks / Open questions / Next steps.
  • Outputs: Final LLM Build Pack.
  • Checks: A team can execute the plan without a meeting; unknowns are explicit and owned.

Quality gate (required)

Examples

Example 1 (RAG copilot): “Use building-with-llms to plan a support-response copilot that drafts replies using our internal KB. Constraints: no PII leakage; must cite sources; p95 latency < 3s; cost < $0.10/ticket.”
Expected: LLM Build Pack with prompt/tool contract, eval set (including privacy red-team cases), and monitoring/rollback plan.

Example 2 (tool-using workflow): “Use building-with-llms to design an LLM workflow that turns meeting notes into action items and Jira tickets (human review required). Output must be valid JSON.”
Expected: output schema + tool contract + eval plan for structured extraction + guardrails against over-creation.

Boundary example: “Fine-tune/train a new LLM from scratch.”
Response: out of scope; propose an API-model approach and highlight what ML/infra work is required if training is truly needed.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

designing-growth-loops

No summary provided by upstream source.

Repository SourceNeeds Review
General

giving-presentations

No summary provided by upstream source.

Repository SourceNeeds Review
General

writing-north-star-metrics

No summary provided by upstream source.

Repository SourceNeeds Review
Web3

defining-product-vision

No summary provided by upstream source.

Repository SourceNeeds Review