Pipeline Builder
Boundaries
- Build NEW pipelines. Do not diagnose broken pipelines — that belongs to
/turbo-doctor. - Do not serve as a YAML reference. If the user only needs to look up a field or syntax, use the
/turbo-pipelinesskill instead. - For dataset lookups, use
/datasets.
Walk the user through building a complete pipeline from scratch, step by step. Generate a valid YAML configuration, validate it, and deploy it.
Mode Detection
Before running any commands, check if you have the Bash tool available:
- If Bash is available (CLI mode): Execute commands, validate YAML, and deploy directly.
- If Bash is NOT available (reference mode): Generate the complete YAML configuration and provide copy-paste instructions for the user to validate and deploy manually.
Builder Workflow
Step 1: Verify Authentication
Run goldsky project list 2>&1 to check login status.
- If logged in: Note the current project and continue.
- If not logged in: Use the
/auth-setupskill for guidance.
Step 2: Understand the Goal
Ask the user what they want to index. Good questions:
- What blockchain/chain? (Ethereum, Base, Polygon, Solana, etc.)
- What data? (transfers, swaps, events from a specific contract, all transactions, etc.)
- Where should the data go? (PostgreSQL, ClickHouse, Kafka, S3, etc.)
- Do they need transforms? (filtering, aggregation, enrichment)
- One-time backfill or continuous streaming?
If the user already described their goal, extract answers from their description.
Step 3: Choose the Dataset
Use the /datasets skill to find the right dataset.
Key points:
- Common datasets:
<chain>.decoded_logs,<chain>.raw_transactions,<chain>.erc20_transfers,<chain>.raw_traces - For decoded contract events, use
<chain>.decoded_logswith a filter onaddressandtopic0 - For Solana: use
solana.transactions,solana.token_transfers, etc.
Present the dataset choice to the user for confirmation.
Step 4: Configure the Source
Build the source section of the YAML:
sources:
my_source:
type: dataset
dataset_name: <chain>.<dataset>
version: 1.0.0
start_at: earliest # or a specific block number
Ask about:
- Start block:
earliest(from genesis),latest(from now), or a specific block number - End block: Only for job-mode/backfill pipelines. Omit for streaming.
- Source-level filter: Optional filter to reduce data at the source (e.g., specific contract address)
Step 5: Configure Transforms (if needed)
If the user needs transforms, use the /turbo-transforms skill to help:
- SQL transforms — filter, aggregate, join, or reshape data using DataFusion SQL
- TypeScript transforms — custom logic, external API calls, complex processing
- Dynamic tables — join with a PostgreSQL table or in-memory allowlist
Build the transforms section:
transforms:
my_transform:
type: sql
primary_key: id
sql: |
SELECT * FROM my_source
WHERE <conditions>
Step 6: Configure the Sink
Ask where the data should go. Use the /turbo-pipelines skill for sink configuration:
| Sink | Key config |
|---|---|
| PostgreSQL | secret_name, schema, table, primary_key |
| ClickHouse | secret_name, table, order_by |
| Kafka | secret_name, topic |
| S3 | bucket, region, prefix, format |
| Webhook | url, format |
For sinks requiring secret_name, check if the secret exists:
goldsky secret list
If it doesn't exist, help create it using the /secrets skill.
Step 7: Choose Mode
Use the /turbo-architecture skill to decide:
- Streaming (default) — continuous processing, no
end_block, runs indefinitely - Job mode — one-time backfill, set
job: trueandend_block
Step 8: Generate, Validate, and Present
Assemble the complete pipeline YAML. Use a descriptive name following the convention: <chain>-<data>-<sink> (e.g., base-erc20-transfers-postgres).
CLI mode (Bash available):
- Write the YAML file to disk (e.g.,
<pipeline-name>.yaml). - Run validation BEFORE showing the YAML to the user:
goldsky turbo validate -f <pipeline-name>.yaml
-
If validation fails, fix the issues and re-validate. Do NOT present the YAML until validation passes. Common fixes:
- Missing
versionfield on dataset source - Invalid dataset name (check chain prefix)
- Missing
secret_namefor database sinks - SQL syntax errors in transforms
- Missing
-
Once validation passes, present the full YAML to the user for review.
Reference mode (no Bash):
- Perform the structural self-check from
turbo-pipelines/references/validation-checklist.md. - Present the YAML with the checklist results.
- Instruct the user to run
goldsky turbo validate -f <file>.yamlbefore deploying.
Step 9: Deploy
After user confirms the YAML looks good:
goldsky turbo apply <pipeline-name>.yaml
Step 10: Verify
After deployment:
goldsky turbo list
Suggest running inspect to verify data flow:
goldsky turbo inspect <pipeline-name>
Present a summary:
## Pipeline Deployed
**Name:** [name]
**Chain:** [chain]
**Dataset:** [dataset]
**Sink:** [sink type]
**Mode:** [streaming/job]
**Next steps:**
- Monitor with `goldsky turbo inspect <name>`
- Check logs with `goldsky turbo logs <name>`
- Use /turbo-doctor if you run into issues
Important Rules
- Always validate before presenting complete YAML to the user. Never show unvalidated complete pipeline YAML.
- Always validate before deploying.
- Always show the user the complete YAML before deploying.
- For job-mode pipelines, remind the user they auto-cleanup ~1hr after completion.
- Use
blackholesink for testing pipelines without writing to a real destination. - If the user wants to modify an existing pipeline, check if it's streaming (update in place) or job-mode (must delete first).
- Default to
start_at: earliestunless the user specifies otherwise. - Always include
version: 1.0.0on dataset sources.
Related
/turbo-pipelines— YAML syntax reference for sources, transforms, and sinks/turbo-doctor— Diagnose and fix pipeline issues/turbo-architecture— Pipeline design patterns and architecture decisions/turbo-transforms— SQL and TypeScript transform reference/datasets— Dataset names and chain prefixes/secrets— Sink credential management