bio-entrez-link

Navigate between NCBI databases using Biopython's Entrez module (ELink utility).

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bio-entrez-link" with this command: npx skills add gptomics/bioskills/gptomics-bioskills-bio-entrez-link

Entrez Link

Navigate between NCBI databases using Biopython's Entrez module (ELink utility).

Required Setup

from Bio import Entrez

Entrez.email = 'your.email@example.com' # Required by NCBI Entrez.api_key = 'your_api_key' # Optional, raises rate limit

Core Function

Entrez.elink() - Cross-Database Links

Find related records in the same or different databases.

Find proteins linked to a gene

handle = Entrez.elink(dbfrom='gene', db='protein', id='672') record = Entrez.read(handle) handle.close()

Extract linked IDs

linkset = record[0] if linkset['LinkSetDb']: links = linkset['LinkSetDb'][0]['Link'] protein_ids = [link['Id'] for link in links] print(f"Found {len(protein_ids)} linked proteins")

Key Parameters:

Parameter Description Example

dbfrom

Source database 'gene'

db

Target database 'protein'

id

Source record ID(s) '672' or '672,675'

linkname

Specific link type 'gene_protein_refseq'

cmd

Link command 'neighbor' , 'neighbor_score'

ELink Result Structure

record[0] # First linkset record[0]['DbFrom'] # Source database record[0]['IdList'] # Input IDs record[0]['LinkSetDb'] # List of link results record[0]['LinkSetDb'][0]['DbTo'] # Target database record[0]['LinkSetDb'][0]['LinkName'] # Link name record[0]['LinkSetDb'][0]['Link'] # List of linked records record[0]['LinkSetDb'][0]['Link'][0]['Id'] # Linked ID

Common Link Paths

Gene to Other Databases

From To Link Name Description

gene protein gene_protein

All proteins

gene protein gene_protein_refseq

RefSeq proteins only

gene nucleotide gene_nuccore

Nucleotide sequences

gene nucleotide gene_nuccore_refseqrna

RefSeq mRNA

gene pubmed gene_pubmed

Related publications

gene homologene gene_homologene

Homologs

gene snp gene_snp

SNPs in gene

gene clinvar gene_clinvar

Clinical variants

Nucleotide to Other Databases

From To Link Name Description

nucleotide protein nuccore_protein

Encoded proteins

nucleotide gene nuccore_gene

Gene records

nucleotide pubmed nuccore_pubmed

Publications

nucleotide taxonomy nuccore_taxonomy

Organism taxonomy

nucleotide biosample nuccore_biosample

Sample info

nucleotide sra nuccore_sra

Related SRA data

Protein to Other Databases

From To Link Name Description

protein nucleotide protein_nuccore

Coding sequences

protein gene protein_gene

Gene records

protein pubmed protein_pubmed

Publications

protein structure protein_structure

3D structures

protein cdd protein_cdd

Conserved domains

PubMed Links

From To Link Name Description

pubmed pubmed pubmed_pubmed

Related articles

pubmed gene pubmed_gene

Mentioned genes

pubmed protein pubmed_protein

Mentioned proteins

pubmed nucleotide pubmed_nuccore

Mentioned sequences

Code Patterns

Gene to Protein

from Bio import Entrez

Entrez.email = 'your.email@example.com'

def get_proteins_for_gene(gene_id): handle = Entrez.elink(dbfrom='gene', db='protein', id=gene_id, linkname='gene_protein_refseq') record = Entrez.read(handle) handle.close()

if not record[0]['LinkSetDb']:
    return []
return [link['Id'] for link in record[0]['LinkSetDb'][0]['Link']]

protein_ids = get_proteins_for_gene('672') # BRCA1 print(f"RefSeq proteins: {protein_ids[:5]}")

Nucleotide to Gene

def get_gene_for_nucleotide(nuc_id): handle = Entrez.elink(dbfrom='nucleotide', db='gene', id=nuc_id) record = Entrez.read(handle) handle.close()

if not record[0]['LinkSetDb']:
    return None
return record[0]['LinkSetDb'][0]['Link'][0]['Id']

gene_id = get_gene_for_nucleotide('NM_007294') print(f"Gene ID: {gene_id}")

Find Related PubMed Articles

def get_related_articles(pmid, max_results=10): handle = Entrez.elink(dbfrom='pubmed', db='pubmed', id=pmid, linkname='pubmed_pubmed') record = Entrez.read(handle) handle.close()

if not record[0]['LinkSetDb']:
    return []
links = record[0]['LinkSetDb'][0]['Link']
return [link['Id'] for link in links[:max_results]]

related = get_related_articles('35412348') print(f"Related articles: {related}")

Get All Available Links

def discover_links(db, record_id): handle = Entrez.elink(dbfrom=db, id=record_id, cmd='acheck') record = Entrez.read(handle) handle.close()

links = {}
for linkset in record[0].get('LinkSetDb', []):
    links[linkset['LinkName']] = linkset['DbTo']
return links

available = discover_links('gene', '672') for name, target in available.items(): print(f"{name} -> {target}")

Navigate Gene -> Protein -> Structure

def gene_to_structures(gene_id): # Gene to protein handle = Entrez.elink(dbfrom='gene', db='protein', id=gene_id, linkname='gene_protein_refseq') record = Entrez.read(handle) handle.close()

if not record[0]['LinkSetDb']:
    return []
protein_ids = [link['Id'] for link in record[0]['LinkSetDb'][0]['Link'][:5]]

# Protein to structure
handle = Entrez.elink(dbfrom='protein', db='structure', id=','.join(protein_ids))
record = Entrez.read(handle)
handle.close()

structure_ids = []
for linkset in record:
    if linkset['LinkSetDb']:
        structure_ids.extend([link['Id'] for link in linkset['LinkSetDb'][0]['Link']])
return structure_ids

structures = gene_to_structures('672') print(f"Structure IDs: {structures[:5]}")

Link Multiple IDs at Once

def batch_link(dbfrom, db, ids): if isinstance(ids, list): ids = ','.join(ids)

handle = Entrez.elink(dbfrom=dbfrom, db=db, id=ids)
record = Entrez.read(handle)
handle.close()

# Returns one linkset per input ID
results = {}
for linkset in record:
    source_id = linkset['IdList'][0]
    linked_ids = []
    if linkset['LinkSetDb']:
        linked_ids = [link['Id'] for link in linkset['LinkSetDb'][0]['Link']]
    results[source_id] = linked_ids
return results

results = batch_link('gene', 'protein', ['672', '675', '7157']) for gene, proteins in results.items(): print(f"Gene {gene}: {len(proteins)} proteins")

Get Publications for a Sequence

def get_sequence_publications(accession): # First get the GI/UID handle = Entrez.esearch(db='nucleotide', term=f'{accession}[accn]') search = Entrez.read(handle) handle.close()

if not search['IdList']:
    return []
uid = search['IdList'][0]

# Link to PubMed
handle = Entrez.elink(dbfrom='nucleotide', db='pubmed', id=uid)
record = Entrez.read(handle)
handle.close()

if not record[0]['LinkSetDb']:
    return []
return [link['Id'] for link in record[0]['LinkSetDb'][0]['Link']]

pmids = get_sequence_publications('NM_007294') print(f"PubMed IDs: {pmids[:5]}")

Link Commands

Command Description

neighbor

Default - get linked records

neighbor_score

Include relevance scores

neighbor_history

Store results in history

acheck

List all available links

ncheck

Check if any links exist

lcheck

Check specific link exists

llinks

Get URLs to Entrez links

prlinks

Get provider links (external)

Common Errors

Error Cause Solution

Empty LinkSetDb

No links exist Check if record has linked data

HTTPError 400

Invalid ID or database Verify ID exists in source database

KeyError

Missing expected field Check if LinkSetDb is empty first

Single linkset expected, got list Multiple input IDs Iterate through record list

Decision Tree

Need to find related records? ├── Know what link you want? │ └── Use elink with specific linkname ├── Discover what links exist? │ └── Use elink with cmd='acheck' ├── Navigate to target database? │ └── Use elink(dbfrom=X, db=Y, id=Z) ├── Find related records in same database? │ └── Use elink(dbfrom=X, db=X) with neighbor ├── Chain multiple databases? │ └── Call elink multiple times └── Need the actual records? └── Use elink first, then efetch with IDs

Related Skills

  • entrez-search - Search databases before linking

  • entrez-fetch - Retrieve records after finding linked IDs

  • batch-downloads - Download many linked records efficiently

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

bio-clip-seq-clip-peak-calling

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

bio-clinical-databases-dbsnp-queries

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

bio-clinical-databases-variant-prioritization

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

bio-workflows-clip-pipeline

No summary provided by upstream source.

Repository SourceNeeds Review