Academic Paper Search (OpenAlex)

Search 240M+ scholarly works using the OpenAlex API -- completely free, no API key required, no SDK needed. Just curl or bash with URL construction.

Full docs: https://docs.openalex.org

Quick Start

OpenAlex is a REST API. You query it by constructing URLs and fetching them with curl . All responses are JSON.

Search for papers about "transformer architecture"

curl -s "https://api.openalex.org/works?search=transformer+architecture&per_page=5&mailto=agent@kortix.ai" | python3 -m json.tool

Important: Always include mailto=agent@kortix.ai (or any valid email) in every request. Without it, you're limited to 1 request/second. With it, you get 10 requests/second (the "polite pool").

Core Concepts

Entities

OpenAlex has these entity types (all queryable):

Entity Endpoint Count Description

Works /works

240M+ Papers, articles, books, datasets, theses

Authors /authors

90M+ People who create works

Sources /sources

250K+ Journals, repositories, conferences

Institutions /institutions

110K+ Universities, research orgs

Topics /topics

4K+ Research topics (hierarchical)

Work Object -- Key Fields

When you fetch a work, these are the most useful fields:

id OpenAlex ID (e.g., "https://openalex.org/W2741809807") doi DOI URL title / display_name Paper title publication_year Year published publication_date Full date (YYYY-MM-DD) cited_by_count Number of incoming citations fwci Field-Weighted Citation Impact (normalized) type article, preprint, review, book, dataset, etc. language ISO 639-1 code (e.g., "en") is_retracted Boolean open_access.is_oa Boolean -- is it freely accessible? open_access.oa_url Direct URL to free version authorships List of authors with names, institutions, ORCIDs abstract_inverted_index Abstract as inverted index (needs reconstruction) referenced_works List of OpenAlex IDs this work cites (outgoing) related_works Algorithmically related works cited_by_api_url API URL to get works that cite this one (incoming) topics Assigned research topics with scores keywords Extracted keywords with scores primary_location Where the work is published (journal, repo) best_oa_location Best open access location with PDF link

Reconstructing Abstracts

OpenAlex stores abstracts as inverted indexes for legal reasons. To get plaintext, reconstruct:

import json, sys

Read the abstract_inverted_index from a work object

inv_idx = work["abstract_inverted_index"] if inv_idx: words = [""] * (max(max(positions) for positions in inv_idx.values()) + 1) for word, positions in inv_idx.items(): for pos in positions: words[pos] = word abstract = " ".join(words)

Or in bash with python3 -c :

Pipe a work JSON into this to extract the abstract

echo "$WORK_JSON" | python3 -c " import json,sys w=json.load(sys.stdin) idx=w.get('abstract_inverted_index',{}) if idx: words=['']*( max(max(p) for p in idx.values())+1 ) for word,positions in idx.items(): for pos in positions: words[pos]=word print(' '.join(words)) "

Searching for Papers

Basic Keyword Search

Searches across titles, abstracts, and fulltext. Uses stemming and stop-word removal.

Simple search

curl -s "https://api.openalex.org/works?search=large+language+models&mailto=agent@kortix.ai"

With per_page limit

curl -s "https://api.openalex.org/works?search=CRISPR+gene+editing&per_page=10&mailto=agent@kortix.ai"

Boolean Search

Use uppercase AND , OR , NOT with parentheses and quoted phrases:

Complex boolean query

curl -s "https://api.openalex.org/works?search=(reinforcement+learning+AND+%22robot+control%22)+NOT+simulation&mailto=agent@kortix.ai"

Exact phrase match (use double quotes, URL-encoded as %22)

curl -s "https://api.openalex.org/works?search=%22attention+is+all+you+need%22&mailto=agent@kortix.ai"

Search Specific Fields

Title only

curl -s "https://api.openalex.org/works?filter=title.search:transformer&mailto=agent@kortix.ai"

Abstract only

curl -s "https://api.openalex.org/works?filter=abstract.search:protein+folding&mailto=agent@kortix.ai"

Title and abstract combined

curl -s "https://api.openalex.org/works?filter=title_and_abstract.search:neural+scaling+laws&mailto=agent@kortix.ai"

Fulltext search (subset of works)

curl -s "https://api.openalex.org/works?filter=fulltext.search:climate+tipping+points&mailto=agent@kortix.ai"

Filtering

Filters are the most powerful feature. Combine them with commas (AND) or pipes (OR).

Most Useful Filters

By publication year

?filter=publication_year:2024 ?filter=publication_year:2020-2024 ?filter=publication_year:>2022

By citation count

?filter=cited_by_count:>100 # highly cited ?filter=cited_by_count:>1000 # landmark papers

By open access

?filter=is_oa:true # only open access ?filter=oa_status:gold # gold OA only

By type

?filter=type:article # journal articles ?filter=type:preprint # preprints ?filter=type:review # review articles

By language

?filter=language:en # English only

Not retracted

?filter=is_retracted:false

Has abstract

?filter=has_abstract:true

Has downloadable PDF

?filter=has_content.pdf:true

By author (OpenAlex ID)

?filter=author.id:A5023888391

By institution (OpenAlex ID)

?filter=institutions.id:I27837315 # e.g., University of Michigan

By DOI

?filter=doi:https://doi.org/10.1038/s41586-021-03819-2

By indexed source

?filter=indexed_in:arxiv # arXiv papers ?filter=indexed_in:pubmed # PubMed papers ?filter=indexed_in:crossref # Crossref papers

Combining Filters

AND: comma-separated

?filter=publication_year:>2022,cited_by_count:>50,is_oa:true,type:article

OR: pipe-separated within a filter

?filter=publication_year:2023|2024

NOT: prefix with !

?filter=type:!preprint

Combined example: highly-cited OA articles from 2023-2024, not preprints

curl -s "https://api.openalex.org/works?filter=publication_year:2023-2024,cited_by_count:>50,is_oa:true,type:!preprint&search=machine+learning&per_page=10&mailto=agent@kortix.ai"

Sorting

Most cited first

?sort=cited_by_count:desc

Most recent first

?sort=publication_date:desc

Most relevant first (only when using search)

?sort=relevance_score:desc

Multiple sort keys

?sort=publication_year:desc,cited_by_count:desc

Pagination

Two modes: basic paging (for browsing) and cursor paging (for collecting all results).

Basic paging (limited to 10,000 results)

?page=1&per_page=25 ?page=2&per_page=25

Cursor paging (unlimited, for collecting everything)

?per_page=100&cursor=* # first page ?per_page=100&cursor=IlsxNjk0ODc... # next page (cursor from previous response meta)

The cursor for the next page is in response.meta.next_cursor . When it's null , you've reached the end.

Select Fields

Reduce response size by selecting only the fields you need:

Only get IDs, titles, citation counts, and DOIs

?select=id,display_name,cited_by_count,doi,publication_year

Minimal metadata for scanning

?select=id,display_name,publication_year,cited_by_count,open_access

Citation Graph Traversal

Find what a paper cites (outgoing references)

Get works cited BY a specific paper

curl -s "https://api.openalex.org/works?filter=cited_by:W2741809807&per_page=25&mailto=agent@kortix.ai"

Find what cites a paper (incoming citations)

Get works that CITE a specific paper

curl -s "https://api.openalex.org/works?filter=cites:W2741809807&sort=cited_by_count:desc&per_page=25&mailto=agent@kortix.ai"

Find related works

Get related works (algorithmic, based on shared concepts)

curl -s "https://api.openalex.org/works?filter=related_to:W2741809807&per_page=25&mailto=agent@kortix.ai"

Citation chain: follow the references

Get a seminal paper by DOI
Find its referenced_works (what it cites)
Find who cites it (filter=cites:WORK_ID )
For the most cited citers, repeat

This is how you build a literature graph around a topic.

Author Lookup

Search for an author

curl -s "https://api.openalex.org/authors?search=Yann+LeCun&mailto=agent@kortix.ai"

Get an author's works (by OpenAlex author ID)

curl -s "https://api.openalex.org/works?filter=author.id:A5064850633&sort=cited_by_count:desc&per_page=10&mailto=agent@kortix.ai"

Get an author by ORCID

curl -s "https://api.openalex.org/authors/orcid:0000-0001-6187-6610?mailto=agent@kortix.ai"

Lookup by External ID

By DOI

curl -s "https://api.openalex.org/works/doi:10.1038/s41586-021-03819-2?mailto=agent@kortix.ai"

By PubMed ID

curl -s "https://api.openalex.org/works/pmid:14907713?mailto=agent@kortix.ai"

By arXiv ID (via DOI)

curl -s "https://api.openalex.org/works/doi:10.48550/arXiv.2303.08774?mailto=agent@kortix.ai"

Batch lookup: up to 50 IDs at once

curl -s "https://api.openalex.org/works?filter=doi:https://doi.org/10.1234/a|https://doi.org/10.1234/b|https://doi.org/10.1234/c&mailto=agent@kortix.ai"

Open Access & PDF Access

Find OA papers with direct PDF links

curl -s "https://api.openalex.org/works?search=quantum+computing&filter=is_oa:true,has_content.pdf:true&select=id,display_name,open_access,best_oa_location&per_page=5&mailto=agent@kortix.ai"

The best_oa_location.pdf_url field gives a direct PDF link when available. The open_access.oa_url gives the best available OA landing page or PDF.

Practical Workflows

Literature Survey on a Topic

1. Find the most-cited papers on a topic

curl -s "https://api.openalex.org/works?search=retrieval+augmented+generation&sort=cited_by_count:desc&filter=publication_year:>2020,type:article,has_abstract:true&per_page=20&select=id,display_name,publication_year,cited_by_count,doi,authorships,abstract_inverted_index&mailto=agent@kortix.ai"

2. For the top papers, explore their citation graphs

curl -s "https://api.openalex.org/works?filter=cites:W4285719527&sort=cited_by_count:desc&per_page=10&select=id,display_name,publication_year,cited_by_count,doi&mailto=agent@kortix.ai"

3. Find recent papers building on this work

curl -s "https://api.openalex.org/works?filter=cites:W4285719527,publication_year:>2023&sort=publication_date:desc&per_page=10&mailto=agent@kortix.ai"

Find Landmark/Seminal Papers

Highly cited + search term

curl -s "https://api.openalex.org/works?search=attention+mechanism+neural+networks&filter=cited_by_count:>500,type:article&sort=cited_by_count:desc&per_page=10&select=id,display_name,publication_year,cited_by_count,doi&mailto=agent@kortix.ai"

Find Recent Preprints

Latest preprints on a topic

curl -s "https://api.openalex.org/works?search=multimodal+large+language+models&filter=type:preprint,publication_year:2025&sort=publication_date:desc&per_page=15&mailto=agent@kortix.ai"

Find Review Articles

Review/survey papers on a topic

curl -s "https://api.openalex.org/works?search=federated+learning&filter=type:review,cited_by_count:>20&sort=cited_by_count:desc&per_page=10&mailto=agent@kortix.ai"

Author Analysis

1. Find the author

curl -s "https://api.openalex.org/authors?search=Geoffrey+Hinton&select=id,display_name,works_count,cited_by_count,last_known_institutions&mailto=agent@kortix.ai"

2. Get their most influential papers

curl -s "https://api.openalex.org/works?filter=author.id:A5068082743&sort=cited_by_count:desc&per_page=10&select=id,display_name,publication_year,cited_by_count,doi&mailto=agent@kortix.ai"

3. Get their recent work

curl -s "https://api.openalex.org/works?filter=author.id:A5068082743,publication_year:>2023&sort=publication_date:desc&per_page=10&mailto=agent@kortix.ai"

Saving Results to Disk

When doing deep research, save paper data to disk for later processing:

Save search results as JSON

curl -s "https://api.openalex.org/works?search=topic&per_page=50&mailto=agent@kortix.ai" > research/papers/topic-search.json

Extract and save a clean summary

curl -s "https://api.openalex.org/works?search=topic&per_page=50&select=id,display_name,publication_year,cited_by_count,doi,authorships&mailto=agent@kortix.ai" | python3 -c " import json, sys data = json.load(sys.stdin) for w in data.get('results', []): authors = ', '.join(a['author']['display_name'] for a in w.get('authorships', [])[:3]) if len(w.get('authorships', [])) > 3: authors += ' et al.' print(f"[{w.get('cited_by_count',0)} cites] {w['display_name']} ({w.get('publication_year','?')}) - {authors}") if w.get('doi'): print(f" DOI: {w['doi']}") print() " > research/papers/topic-summary.txt

For deep research, save individual paper metadata to your sources-index.md and raw data to sources/ :

Save a paper's full metadata

curl -s "https://api.openalex.org/works/W2741809807?mailto=agent@kortix.ai" > research/sources/001-paper-title.json

Rate Limits

Pool Rate How to get it

Common 1 req/sec No email provided

Polite 10 req/sec Add mailto=your@email.com to requests

Premium Higher Paid API key via api_key param

Always use the polite pool. Add &mailto=agent@kortix.ai to every request.

Tips

Use select aggressively to reduce response size and speed up requests
Use per_page=100 (max) when collecting lots of results to minimize request count
Use cursor paging (cursor=* ) when you need more than 10,000 results
Batch DOI lookups with OR syntax: filter=doi:DOI1|DOI2|DOI3 (up to 50)
Reconstruct abstracts using the inverted index -- don't skip this, abstracts are gold
Follow citation chains to find seminal works and recent developments
Filter by has_abstract:true when you need abstracts (not all works have them)
Filter by indexed_in:arxiv or indexed_in:pubmed to target specific repositories
Sort by cited_by_count:desc to find the most influential papers first
Combine search + filters for precise results: search gives relevance, filters give precision

openalex-paper-search

Safety Notice

Copy this and send it to your AI assistant to learn

Search for papers about "transformer architecture"

Read the abstract_inverted_index from a work object

Pipe a work JSON into this to extract the abstract

Simple search

With per_page limit

Complex boolean query

Exact phrase match (use double quotes, URL-encoded as %22)

Title only

Abstract only

Title and abstract combined

Fulltext search (subset of works)

By publication year

By citation count

By open access

By type

By language

Not retracted

Has abstract

Has downloadable PDF

By author (OpenAlex ID)

By institution (OpenAlex ID)

By DOI

By indexed source

AND: comma-separated

OR: pipe-separated within a filter

NOT: prefix with !

Combined example: highly-cited OA articles from 2023-2024, not preprints

Most cited first

Most recent first

Most relevant first (only when using search)

Multiple sort keys

Basic paging (limited to 10,000 results)

Cursor paging (unlimited, for collecting everything)

Only get IDs, titles, citation counts, and DOIs

Minimal metadata for scanning

Get works cited BY a specific paper

Get works that CITE a specific paper

Get related works (algorithmic, based on shared concepts)

Search for an author

Get an author's works (by OpenAlex author ID)

Get an author by ORCID

By DOI

By PubMed ID

By arXiv ID (via DOI)

Batch lookup: up to 50 IDs at once

Find OA papers with direct PDF links

1. Find the most-cited papers on a topic

2. For the top papers, explore their citation graphs

3. Find recent papers building on this work

Highly cited + search term

Latest preprints on a topic

Review/survey papers on a topic

1. Find the author

2. Get their most influential papers

3. Get their recent work

Save search results as JSON

Extract and save a clean summary

Save a paper's full metadata

Source Transparency

Related Skills

deep-research

paper-creator

domain-research

elevenlabs