Table Enrichment
Add research-powered enrichment columns to Extruct company tables.
Environment
| Variable | Service |
|---|---|
EXTRUCT_API_TOKEN | Extruct API |
Before making API calls, check that EXTRUCT_API_TOKEN is set by running test -n "$EXTRUCT_API_TOKEN" && echo "set" || echo "missing". If missing, ask the user to provide their Extruct API token and set it via export EXTRUCT_API_TOKEN=<value>. Do not proceed until confirmed.
Base URL: https://api.extruct.ai/v1
Official API Reference
Workflow
Step 0: Verify API reference
- Read local reference: references/api_reference.md
- Fetch live docs: https://www.extruct.ai/docs
- Compare endpoints, params, and response fields
- If discrepancies found:
- Update the local reference file
- Flag changes to the user before proceeding
- Proceed with the skill workflow
1. Confirm the table
Get the table ID from the user (URL or ID). Fetch table metadata via GET /tables/{table_id}. Show the user: table name, row count, existing columns.
2. Get column configs
Two paths:
Path A: From enrichment-design — User has column_configs ready. Confirm and proceed.
Path B: Design on the fly — Confirm with the user:
- What data point? — what to research (e.g. "funding stage", "primary vertical", "tech stack")
- Output format — pick the right format:
| Format | When to use | Extra params |
|---|---|---|
text | Free-form research output | — |
number / money | Numeric data (revenue, headcount) | — |
select | Single choice from known categories | labels: [...] |
multiselect | Multiple tags from known categories | labels: [...] |
json | Structured multi-field data | output_schema: {...} |
grade | 1-5 score | — |
label | Single tag from list | labels: [...] |
date | Date values | — |
url / email / phone | Contact info | — |
- Agent type — default
research_pro. Usellmwhen no web research needed (classification from existing profile data).
3. Write the prompt
Craft a clear prompt using {input} for the row's domain value. Prompt guidelines:
- Be specific about what to find
- Specify the exact output format in the prompt (e.g. "Return ONLY a number in millions USD")
- Include fallback instruction (e.g. "If not found, return N/A")
- For
select/multiselect, the labels constrain the output — the prompt should guide which label to pick
4. Create the column(s)
Create columns via POST /tables/{table_id}/columns with the column_configs array.
5. Trigger enrichment (only the new columns)
Run via POST /tables/{table_id}/run with { "mode": "new", "columns": [new_column_ids] }.
Important: Always scope the run to the newly created column(s) only. Avoid broad or implicit run payloads when you only intend to enrich specific columns.
Report: run ID, rows queued, and table URL.
6. Monitor progress
Poll the table data via GET /tables/{table_id}/data every 30 seconds. For each row, check the status field of the relevant column cells (done, pending, failed).
Show the user:
- Current % complete (done cells / total cells)
- Number of failed cells (if any)
- Estimated time remaining (based on rate so far)
Stop polling when all cells are done or failed.
7. Quality spot-check
After enrichment completes (or after 50%+ is done), fetch a sample of 5-10 enriched rows and display for review.
Present to user as a table. Ask:
- "Does the data quality look right?"
- "Any columns returning garbage or N/A too often?"
- "Should we adjust any prompts and re-run?"
If quality issues are found:
- Delete the problematic column
- Adjust the prompt
- Re-create and re-run
API Reference
See references/api_reference.md for full API spec: all output formats, agent types, prompt variables, and endpoints.