biorxiv-database

Search and retrieve preprints from bioRxiv. Use when asked to "search bioRxiv", "find preprints", "look up bioRxiv papers", or retrieve life sciences literature.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "biorxiv-database" with this command: npx skills add aminoanalytica/amina-skills/aminoanalytica-amina-skills-biorxiv-database

bioRxiv Database

A Python toolkit for programmatic access to bioRxiv preprints. Supports comprehensive metadata retrieval with structured JSON output for integration into research workflows.

Use Cases

  • Query recent preprints by topic or research domain
  • Monitor publications from specific researchers
  • Perform systematic literature reviews
  • Analyze publication trends across time periods
  • Retrieve citation metadata and DOIs
  • Download preprint PDFs for text analysis
  • Filter results by subject category

Quick Start

# Install dependencies
pip install requests

# Search by keywords
python scripts/biorxiv_client.py --terms "protein folding" --recent 30 --out results.json

# Search by author
python scripts/biorxiv_client.py --author "Chen" --recent 180

# Get specific paper by DOI
python scripts/biorxiv_client.py --doi "10.1101/2024.05.22.594321"

# Download PDF
python scripts/biorxiv_client.py --doi "10.1101/2024.05.22.594321" --fetch-pdf paper.pdf

Command-Line Options

OptionDescription
-t, --termsSearch keywords (multiple allowed)
-a, --authorAuthor name to search
--doiSpecific DOI to retrieve
--sinceStart date (YYYY-MM-DD)
--untilEnd date (YYYY-MM-DD)
--recentSearch last N days
-s, --subjectSubject category filter
--fieldsFields to search: title, abstract, authors
-o, --outOutput file (default: stdout)
--maxMaximum results to return
--fetch-pdfDownload PDF (requires --doi)
-v, --verboseEnable debug output

Programmatic API

from scripts.biorxiv_client import PreprintClient

client = PreprintClient(debug=True)

# Search by keywords
results = client.find_by_terms(
    terms=["enzyme engineering"],
    since="2024-01-01",
    until="2024-12-31",
    subject="biochemistry"
)

# Search by author
papers = client.find_by_author(name="Garcia", since="2023-01-01")

# Get paper by DOI
metadata = client.get_by_doi("10.1101/2024.05.22.594321")

# Download PDF
client.fetch_pdf(doi="10.1101/2024.05.22.594321", destination="paper.pdf")

# Normalize output
formatted = client.normalize(metadata, include_abstract=True)

Subject Categories

CategoryCategory
animal-behavior-and-cognitionmolecular-biology
biochemistryneuroscience
bioengineeringpaleontology
bioinformaticspathology
biophysicspharmacology-and-toxicology
cancer-biologyphysiology
cell-biologyplant-biology
clinical-trialsscientific-communication-and-education
developmental-biologysynthetic-biology
ecologysystems-biology
epidemiologyzoology
evolutionary-biology
genetics
genomics
immunology
microbiology

Response Structure

{
  "query": {
    "terms": ["protein folding"],
    "since": "2024-03-01",
    "until": "2024-09-30",
    "subject": "biophysics"
  },
  "count": 87,
  "papers": [
    {
      "doi": "10.1101/2024.05.22.594321",
      "title": "Example Preprint Title",
      "authors": "Chen L, Patel R, Kim S",
      "corresponding_author": "Chen L",
      "institution": "Research Institute",
      "posted": "2024-05-22",
      "revision": "1",
      "category": "biophysics",
      "license": "cc_by",
      "paper_type": "new results",
      "abstract": "Abstract content here...",
      "pdf_link": "https://www.biorxiv.org/content/10.1101/2024.05.22.594321v1.full.pdf",
      "web_link": "https://www.biorxiv.org/content/10.1101/2024.05.22.594321v1",
      "journal_ref": ""
    }
  ]
}

Best Practices

RecommendationDetails
Date rangesNarrow ranges improve response time. Split large queries into chunks.
Category filtersUse --subject to reduce bandwidth and improve precision.
Rate limitingBuilt-in 0.5s delay between requests. Add more for bulk operations.
Result cachingSave JSON outputs to avoid redundant API calls.
Version awarenessPreprints may have multiple versions. PDF URLs encode version numbers.
Error checkingVerify count in outputs. Zero results may indicate date or connectivity issues.
Debug modeUse --verbose for detailed request/response logging.

Reference Files

FileContents
api-reference.mdComplete bioRxiv REST API documentation
examples.mdExtended code examples and workflow patterns

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

pdb-database

No summary provided by upstream source.

Repository SourceNeeds Review
General

uniprot-database

No summary provided by upstream source.

Repository SourceNeeds Review
General

pymol

No summary provided by upstream source.

Repository SourceNeeds Review
General

scikit-bio

No summary provided by upstream source.

Repository SourceNeeds Review