Academic Paper Search (OpenAlex)
Search 240M+ scholarly works using the OpenAlex API -- completely free, no API key required, no SDK needed. Just curl or bash with URL construction.
Full docs: https://docs.openalex.org
Quick Start
OpenAlex is a REST API. You query it by constructing URLs and fetching them with curl . All responses are JSON.
Search for papers about "transformer architecture"
curl -s "https://api.openalex.org/works?search=transformer+architecture&per_page=5&mailto=agent@kortix.ai" | python3 -m json.tool
Important: Always include mailto=agent@kortix.ai (or any valid email) in every request. Without it, you're limited to 1 request/second. With it, you get 10 requests/second (the "polite pool").
Core Concepts
Entities
OpenAlex has these entity types (all queryable):
Entity Endpoint Count Description
Works /works
240M+ Papers, articles, books, datasets, theses
Authors /authors
90M+ People who create works
Sources /sources
250K+ Journals, repositories, conferences
Institutions /institutions
110K+ Universities, research orgs
Topics /topics
4K+ Research topics (hierarchical)
Work Object -- Key Fields
When you fetch a work, these are the most useful fields:
id OpenAlex ID (e.g., "https://openalex.org/W2741809807") doi DOI URL title / display_name Paper title publication_year Year published publication_date Full date (YYYY-MM-DD) cited_by_count Number of incoming citations fwci Field-Weighted Citation Impact (normalized) type article, preprint, review, book, dataset, etc. language ISO 639-1 code (e.g., "en") is_retracted Boolean open_access.is_oa Boolean -- is it freely accessible? open_access.oa_url Direct URL to free version authorships List of authors with names, institutions, ORCIDs abstract_inverted_index Abstract as inverted index (needs reconstruction) referenced_works List of OpenAlex IDs this work cites (outgoing) related_works Algorithmically related works cited_by_api_url API URL to get works that cite this one (incoming) topics Assigned research topics with scores keywords Extracted keywords with scores primary_location Where the work is published (journal, repo) best_oa_location Best open access location with PDF link
Reconstructing Abstracts
OpenAlex stores abstracts as inverted indexes for legal reasons. To get plaintext, reconstruct:
import json, sys
Read the abstract_inverted_index from a work object
inv_idx = work["abstract_inverted_index"] if inv_idx: words = [""] * (max(max(positions) for positions in inv_idx.values()) + 1) for word, positions in inv_idx.items(): for pos in positions: words[pos] = word abstract = " ".join(words)
Or in bash with python3 -c :
Pipe a work JSON into this to extract the abstract
echo "$WORK_JSON" | python3 -c " import json,sys w=json.load(sys.stdin) idx=w.get('abstract_inverted_index',{}) if idx: words=['']*( max(max(p) for p in idx.values())+1 ) for word,positions in idx.items(): for pos in positions: words[pos]=word print(' '.join(words)) "
Searching for Papers
Basic Keyword Search
Searches across titles, abstracts, and fulltext. Uses stemming and stop-word removal.
Simple search
curl -s "https://api.openalex.org/works?search=large+language+models&mailto=agent@kortix.ai"
With per_page limit
Boolean Search
Use uppercase AND , OR , NOT with parentheses and quoted phrases:
Complex boolean query
Exact phrase match (use double quotes, URL-encoded as %22)
curl -s "https://api.openalex.org/works?search=%22attention+is+all+you+need%22&mailto=agent@kortix.ai"
Search Specific Fields
Title only
curl -s "https://api.openalex.org/works?filter=title.search:transformer&mailto=agent@kortix.ai"
Abstract only
curl -s "https://api.openalex.org/works?filter=abstract.search:protein+folding&mailto=agent@kortix.ai"
Title and abstract combined
Fulltext search (subset of works)
Filtering
Filters are the most powerful feature. Combine them with commas (AND) or pipes (OR).
Most Useful Filters
By publication year
?filter=publication_year:2024 ?filter=publication_year:2020-2024 ?filter=publication_year:>2022
By citation count
?filter=cited_by_count:>100 # highly cited ?filter=cited_by_count:>1000 # landmark papers
By open access
?filter=is_oa:true # only open access ?filter=oa_status:gold # gold OA only
By type
?filter=type:article # journal articles ?filter=type:preprint # preprints ?filter=type:review # review articles
By language
?filter=language:en # English only
Not retracted
?filter=is_retracted:false
Has abstract
?filter=has_abstract:true
Has downloadable PDF
?filter=has_content.pdf:true
By author (OpenAlex ID)
?filter=author.id:A5023888391
By institution (OpenAlex ID)
?filter=institutions.id:I27837315 # e.g., University of Michigan
By DOI
?filter=doi:https://doi.org/10.1038/s41586-021-03819-2
By indexed source
?filter=indexed_in:arxiv # arXiv papers ?filter=indexed_in:pubmed # PubMed papers ?filter=indexed_in:crossref # Crossref papers
Combining Filters
AND: comma-separated
?filter=publication_year:>2022,cited_by_count:>50,is_oa:true,type:article
OR: pipe-separated within a filter
?filter=publication_year:2023|2024
NOT: prefix with !
?filter=type:!preprint
Combined example: highly-cited OA articles from 2023-2024, not preprints
Sorting
Most cited first
?sort=cited_by_count:desc
Most recent first
?sort=publication_date:desc
Most relevant first (only when using search)
?sort=relevance_score:desc
Multiple sort keys
?sort=publication_year:desc,cited_by_count:desc
Pagination
Two modes: basic paging (for browsing) and cursor paging (for collecting all results).
Basic paging (limited to 10,000 results)
?page=1&per_page=25 ?page=2&per_page=25
Cursor paging (unlimited, for collecting everything)
?per_page=100&cursor=* # first page ?per_page=100&cursor=IlsxNjk0ODc... # next page (cursor from previous response meta)
The cursor for the next page is in response.meta.next_cursor . When it's null , you've reached the end.
Select Fields
Reduce response size by selecting only the fields you need:
Only get IDs, titles, citation counts, and DOIs
?select=id,display_name,cited_by_count,doi,publication_year
Minimal metadata for scanning
?select=id,display_name,publication_year,cited_by_count,open_access
Citation Graph Traversal
Find what a paper cites (outgoing references)
Get works cited BY a specific paper
Find what cites a paper (incoming citations)
Get works that CITE a specific paper
Find related works
Get related works (algorithmic, based on shared concepts)
Citation chain: follow the references
-
Get a seminal paper by DOI
-
Find its referenced_works (what it cites)
-
Find who cites it (filter=cites:WORK_ID )
-
For the most cited citers, repeat
This is how you build a literature graph around a topic.
Author Lookup
Search for an author
curl -s "https://api.openalex.org/authors?search=Yann+LeCun&mailto=agent@kortix.ai"
Get an author's works (by OpenAlex author ID)
Get an author by ORCID
curl -s "https://api.openalex.org/authors/orcid:0000-0001-6187-6610?mailto=agent@kortix.ai"
Lookup by External ID
By DOI
curl -s "https://api.openalex.org/works/doi:10.1038/s41586-021-03819-2?mailto=agent@kortix.ai"
By PubMed ID
curl -s "https://api.openalex.org/works/pmid:14907713?mailto=agent@kortix.ai"
By arXiv ID (via DOI)
curl -s "https://api.openalex.org/works/doi:10.48550/arXiv.2303.08774?mailto=agent@kortix.ai"
Batch lookup: up to 50 IDs at once
Open Access & PDF Access
Find OA papers with direct PDF links
The best_oa_location.pdf_url field gives a direct PDF link when available. The open_access.oa_url gives the best available OA landing page or PDF.
Practical Workflows
Literature Survey on a Topic
1. Find the most-cited papers on a topic
2. For the top papers, explore their citation graphs
3. Find recent papers building on this work
Find Landmark/Seminal Papers
Highly cited + search term
Find Recent Preprints
Latest preprints on a topic
Find Review Articles
Review/survey papers on a topic
Author Analysis
1. Find the author
2. Get their most influential papers
3. Get their recent work
Saving Results to Disk
When doing deep research, save paper data to disk for later processing:
Save search results as JSON
curl -s "https://api.openalex.org/works?search=topic&per_page=50&mailto=agent@kortix.ai" > research/papers/topic-search.json
Extract and save a clean summary
curl -s "https://api.openalex.org/works?search=topic&per_page=50&select=id,display_name,publication_year,cited_by_count,doi,authorships&mailto=agent@kortix.ai" | python3 -c " import json, sys data = json.load(sys.stdin) for w in data.get('results', []): authors = ', '.join(a['author']['display_name'] for a in w.get('authorships', [])[:3]) if len(w.get('authorships', [])) > 3: authors += ' et al.' print(f"[{w.get('cited_by_count',0)} cites] {w['display_name']} ({w.get('publication_year','?')}) - {authors}") if w.get('doi'): print(f" DOI: {w['doi']}") print() " > research/papers/topic-summary.txt
For deep research, save individual paper metadata to your sources-index.md and raw data to sources/ :
Save a paper's full metadata
curl -s "https://api.openalex.org/works/W2741809807?mailto=agent@kortix.ai" > research/sources/001-paper-title.json
Rate Limits
Pool Rate How to get it
Common 1 req/sec No email provided
Polite 10 req/sec Add mailto=your@email.com to requests
Premium Higher Paid API key via api_key param
Always use the polite pool. Add &mailto=agent@kortix.ai to every request.
Tips
-
Use select aggressively to reduce response size and speed up requests
-
Use per_page=100 (max) when collecting lots of results to minimize request count
-
Use cursor paging (cursor=* ) when you need more than 10,000 results
-
Batch DOI lookups with OR syntax: filter=doi:DOI1|DOI2|DOI3 (up to 50)
-
Reconstruct abstracts using the inverted index -- don't skip this, abstracts are gold
-
Follow citation chains to find seminal works and recent developments
-
Filter by has_abstract:true when you need abstracts (not all works have them)
-
Filter by indexed_in:arxiv or indexed_in:pubmed to target specific repositories
-
Sort by cited_by_count:desc to find the most influential papers first
-
Combine search + filters for precise results: search gives relevance, filters give precision