OpenAlex

Use this skill to run reliable OpenAlex API workflows from shell.

IMPORTANT: Always write curl commands on a single line. Multi-line \ continuation breaks argument parsing in agent environments and will cause errors.

Definition of Done

A task is complete when:

Results

The API returns at least one result (or a clear "no results found" message)
Each result shows: title (display_name), year, citation count
Output is readable — not a raw JSON blob

Process

curl written on a single line
api_key included in every request
select= used to limit returned fields
jq used to format output

PDF download (when requested)

If PDF is available: file saved locally, path printed
If PDF is not available: clear message, exit code 2, no crash

Quick Start

Export API key:

export OPENALEX_API_KEY='...'

To verify it is set without printing the value:

[[ -n "${OPENALEX_API_KEY:-}" ]] && echo "key is set" || echo "ERROR: OPENALEX_API_KEY not set"

Run list query (works):

curl -sS --get 'https://api.openalex.org/works' --data-urlencode 'search="data quality" AND "open government data"' --data-urlencode 'filter=type:article,from_publication_date:2023-01-01' --data-urlencode 'sort=relevance_score:desc' --data-urlencode 'per-page=200' --data-urlencode 'select=id,display_name,publication_year,cited_by_count,doi' --data-urlencode "api_key=$OPENALEX_API_KEY" | jq '.results[] | {title:.display_name, year:.publication_year, cited_by:.cited_by_count, doi}'

Workflow

Define entity endpoint (works, authors, sources, etc.).
Build a search block with boolean logic (AND, OR, NOT, quotes, parentheses).
Add structured filter constraints (type/date/language/OA/citation fields).
Restrict output with select (root-level fields only).
Page results with page or cursor=*.
Extract fields via jq and save/transform as needed.

Iterative Validation Workflow

Use this when building or debugging non-trivial queries.

Start with a toy query (per-page=5 or per-page=10) and minimal select=.
Manually inspect 5-10 records for relevance and field quality (display_name, year, DOI).
Compare a baseline and a variant before scaling:
- baseline: filter=title.search:"..."
- variant: search=... with same filters
Tune one parameter at a time (search, filter, sort, per-page, pagination mode).
Scale only after validation (per-page=200, then cursor=* for deep pagination).
Log each run: command, key parameters, result count, and quick notes.

Avoid jumping directly from a paper/spec to a full extraction script without this short validation loop.

Query Blocks

title.search=: searches only in the title — use this by default for focused results. Must be passed inside filter=, not as a standalone parameter: filter=title.search:"your query".
search=: full-text search across the entire document — use only when title-only matching is too restrictive.
search.semantic=: semantic/conceptual search (costs $0.001/request; requires API key).
filter=: exact/structured constraints; comma means AND.
sort=: relevance_score:desc, cited_by_count:desc, publication_date:desc, etc.
per-page=: 1..200. Default is 25 — always set per-page=200 for bulk queries (8× fewer API calls).
page=: page number for standard pagination.
cursor=*: deep pagination beyond first 10k records.
select=: reduce payload; nested paths are not allowed in select.
group_by=: aggregate results by a field (e.g. group_by=publication_year, group_by=topics.id).
sample=: random sample of N results (e.g. sample=20). Add seed=42 for reproducibility.

Filter Syntax

Filters are comma-separated AND conditions. Within a single attribute:

Logic	Syntax	Example
AND (comma)	`filter=a:x,b:y`	`filter=type:article,is_oa:true`
OR (pipe)	`filter=type:article\|book`	multiple values for same field
NOT (exclamation)	`filter=type:!journal-article`	negation
Greater than	`filter=cited_by_count:>100`	comparison
Less than	`filter=publication_year:<2020`	comparison
Range	`filter=publication_year:2020-2023`	inclusive range

Batch Lookup

Combine up to 50 IDs in one request using the pipe operator — avoid sequential calls:

# Batch DOI lookup (up to 50 per request)
curl -sS --get 'https://api.openalex.org/works' --data-urlencode 'filter=doi:https://doi.org/10.1/abc|https://doi.org/10.2/def' --data-urlencode 'per-page=50' --data-urlencode "api_key=$OPENALEX_API_KEY" | jq '.results[] | {title:.display_name, doi}'

Two-Step Entity Lookup

Names are ambiguous; always resolve to an OpenAlex ID first, then filter.

Step 1 — find the entity ID:

curl -sS --get 'https://api.openalex.org/authors' --data-urlencode 'search=Heather Piwowar' --data-urlencode 'per-page=5' --data-urlencode "api_key=$OPENALEX_API_KEY" | jq '.results[] | {id, display_name}'

Step 2 — use the ID in a filter:

curl -sS --get 'https://api.openalex.org/works' --data-urlencode 'filter=authorships.author.id:A5023888391' --data-urlencode 'per-page=200' --data-urlencode 'select=id,display_name,publication_year,cited_by_count' --data-urlencode "api_key=$OPENALEX_API_KEY" | jq '.results[] | {title:.display_name, year:.publication_year}'

Applies to: authors (authorships.author.id), institutions (authorships.institutions.id), sources/journals (primary_location.source.id). External IDs are also accepted: ORCID, ROR, ISSN, DOI.

PDF Retrieval

For a work ID:

Fetch work metadata.
Resolve PDF URL in this order:
- .content_urls.pdf
- .best_oa_location.pdf_url
- .primary_location.pdf_url
- first non-null .locations[].pdf_url
Download with api_key query parameter when source is content.openalex.org.

Output Format

When displaying results, always show display_name as the title — never use doi or id in its place.

Minimal jq for a results table:

| jq -r '.results[] | [.display_name, .publication_year, .cited_by_count, .doi] | @tsv'

Or as structured objects:

| jq '.results[] | {title: .display_name, year: .publication_year, cited_by: .cited_by_count, doi}'

CSV Export

To save results as a CSV file, use jq with @csv and include a header row:

curl -sS --get 'https://api.openalex.org/works' ... --data-urlencode "api_key=$OPENALEX_API_KEY" | jq -r '["title","year","cited_by","doi"], (.results[] | [.display_name, .publication_year, .cited_by_count, (.doi // "")]) | @csv' > results.csv

Rules:

Use // "" for fields that may be null (e.g. doi) — @csv fails on null values.
The header array and data array must have the same number of columns.
Use -r (raw output) so @csv produces plain text, not JSON strings.

Error Handling

Implement exponential backoff on 403 (rate limit) and 500 (server error):

attempt 1 → wait 1s → attempt 2 → wait 2s → attempt 3 → wait 4s → attempt 4 → wait 8s

HTTP codes:

200 — success
400 — invalid parameter or filter syntax; fix the query
403 — rate limit exceeded; back off and retry
404 — entity not found
500 — temporary server error; retry with backoff

Endpoint Costs

With the free $1/day budget:

Request type	Cost	Daily limit
Singleton (`/works/W123`)	free	unlimited
List / filter	$0.0001	~10,000 requests
Search (full-text or semantic)	$0.001	~1,000 requests
PDF download (`content.openalex.org`)	$0.01	~100 downloads

Use select= and per-page=200 to minimize request count.

Common Pitfalls

Do not sort by relevance_score without a search query.
Do not use nested fields in select (example: use open_access, then parse .open_access.is_oa with jq).
Do not filter by entity names directly — use the two-step entity lookup to get the ID first.
Do not use sequential calls for batch ID lookups — batch up to 50 with the pipe operator.
Do not use per-page=25 (default) for bulk extraction — always set per-page=200.
Expect some records to have no downloadable PDF.
search= searches full text and can return loosely related results. Use title.search= when the topic must appear in the title.
Always write curl commands on a single line — multi-line \ continuation breaks argument parsing in agent environments.
title.search is NOT a valid standalone parameter — always pass it inside filter=: filter=title.search:"your query".
Always include api_key=$OPENALEX_API_KEY in every request.
Never print or echo $OPENALEX_API_KEY to verify it is set — use [[ -n "${OPENALEX_API_KEY:-}" ]] instead.

Resources

Query recipes and jq snippets: references/query-recipes.md
Generic query helper: scripts/openalex_query.sh
PDF downloader for work IDs: scripts/openalex_download_pdf.sh

openalex

Safety Notice

Copy this and send it to your AI assistant to learn

OpenAlex

Definition of Done

Quick Start

Workflow

Iterative Validation Workflow

Query Blocks

Filter Syntax

Batch Lookup

Two-Step Entity Lookup

PDF Retrieval

Output Format

CSV Export

Error Handling

Endpoint Costs

Common Pitfalls

Resources

Source Transparency

Related Skills

research

research

research

research