bio-proteomics-spectral-libraries

Spectral Library Management

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bio-proteomics-spectral-libraries" with this command: npx skills add gptomics/bioskills/gptomics-bioskills-bio-proteomics-spectral-libraries

Spectral Library Management

Build Library from DDA Data

SpectraST (TPP)

Build library from search results

spectrast -cNlibrary.splib -cAC search_results.pep.xml

Filter library for quality

spectrast -cNfiltered.splib -cAQ library.splib

Convert to other formats

spectrast -cNlibrary.tsv -cM library.splib

EasyPQP (Skyline/OpenMS)

Build library from search results

easypqp library
--in psm_results.tsv
--out library.pqp
--psmtsv
--rt_reference irt.tsv

Convert to TSV format

easypqp convert
--in library.pqp
--out library.tsv
--format openswath

EncyclopeDIA (Walnut)

Build chromatogram library from DIA

EncyclopeDIA
-i sample1.mzML
-i sample2.mzML
-l wide_window_library.dlib
-f uniprot.fasta
-o results

Search with narrow-window DIA

EncyclopeDIA
-i narrow_sample.mzML
-l narrow_library.elib
-f uniprot.fasta
-o search_results

Predicted Libraries

Prosit (Deep Learning)

Generate predictions via Prosit API

import requests import pandas as pd

peptides = pd.DataFrame({ 'modified_sequence': ['PEPTIDEK', 'ANOTHERPEPTIDER'], 'collision_energy': [30, 30], 'precursor_charge': [2, 2] })

Submit to Prosit server

response = requests.post( 'https://www.proteomicsdb.org/prosit/api/predict', json=peptides.to_dict(orient='records') )

Parse response to library format

predictions = response.json()

DeepLC Retention Time Prediction

from deeplc import DeepLC

Initialize predictor

dlc = DeepLC()

Predict retention times

peptides = ['PEPTIDEK', 'ANOTHERPEPTIDER'] calibration_peptides = ['GAGSSEPVTGLDAK', 'VEATFGVDESNAK'] calibration_rts = [22.4, 33.1]

Calibrate and predict

dlc.calibrate_preds( seq_df=pd.DataFrame({'seq': calibration_peptides, 'rt': calibration_rts}) ) predicted_rts = dlc.make_preds(seq_df=pd.DataFrame({'seq': peptides}))

MS2PIP Fragmentation Prediction

from ms2pip import Predictor

Initialize predictor

predictor = Predictor(model='HCD2021')

Predict fragmentation

peptide_df = pd.DataFrame({ 'peptide': ['PEPTIDEK', 'ANOTHERPEPTIDER'], 'charge': [2, 2], 'modifications': ['', ''] })

predictions = predictor.predict(peptide_df)

Library Formats

DIA-NN TSV Format

Required columns

PrecursorMz ProductMz Annotation ProteinId GeneName PeptideSequence ModifiedSequence PrecursorCharge FragmentCharge FragmentType FragmentSeriesNumber NormalizedRetentionTime LibraryIntensity

OpenSWATH TSV Format

import pandas as pd

Convert to OpenSWATH format

library = pd.DataFrame({ 'PrecursorMz': precursor_mz, 'ProductMz': product_mz, 'LibraryIntensity': intensity, 'NormalizedRetentionTime': rt, 'PrecursorCharge': charge, 'ProductCharge': 1, 'FragmentType': ion_type, # 'b' or 'y' 'FragmentSeriesNumber': ion_num, 'ModifiedPeptideSequence': mod_seq, 'PeptideSequence': sequence, 'ProteinId': protein, 'GeneName': gene, 'Decoy': 0 })

library.to_csv('library_openswath.tsv', sep='\t', index=False)

Spectronaut Library Format

Key columns for Spectronaut

ModifiedPeptide StrippedPeptide PrecursorCharge PrecursorMz iRT FragmentLossType FragmentCharge FragmentType FragmentNumber RelativeIntensity FragmentMz ProteinGroups Genes ProteinIds

Library QC

import pandas as pd

library = pd.read_csv('library.tsv', sep='\t')

Basic statistics

print(f"Precursors: {library['ModifiedSequence'].nunique()}") print(f"Proteins: {library['ProteinId'].nunique()}") print(f"Transitions per precursor: {len(library) / library['ModifiedSequence'].nunique():.1f}")

RT distribution

import matplotlib.pyplot as plt rts = library.groupby('ModifiedSequence')['NormalizedRetentionTime'].first() plt.hist(rts, bins=50) plt.xlabel('Normalized RT') plt.ylabel('Precursors') plt.savefig('rt_distribution.png')

Charge state distribution

charges = library.groupby('ModifiedSequence')['PrecursorCharge'].first() print(charges.value_counts())

Merge Libraries

import pandas as pd

Load libraries

lib1 = pd.read_csv('library1.tsv', sep='\t') lib2 = pd.read_csv('library2.tsv', sep='\t')

Concatenate and remove duplicates

Keep entry with highest total intensity per precursor

combined = pd.concat([lib1, lib2])

Calculate total intensity per precursor

precursor_intensity = combined.groupby('ModifiedSequence')['LibraryIntensity'].sum()

Keep best precursor entries

combined['total_int'] = combined['ModifiedSequence'].map(precursor_intensity) combined = combined.sort_values('total_int', ascending=False) combined = combined.drop_duplicates(subset=['ModifiedSequence', 'FragmentType', 'FragmentSeriesNumber']) combined = combined.drop('total_int', axis=1)

combined.to_csv('merged_library.tsv', sep='\t', index=False)

iRT Calibration

Biognosys iRT peptides for retention time calibration

IRT_PEPTIDES = { 'LGGNEQVTR': -24.92, 'GAGSSEPVTGLDAK': 0.00, # Reference 'VEATFGVDESNAK': 12.39, 'YILAGVENSK': 19.79, 'TPVISGGPYEYR': 28.71, 'TPVITGAPYEYR': 33.38, 'DGLDAASYYAPVR': 42.26, 'ADVTPADFSEWSK': 54.62, 'GTFIIDPGGVIR': 70.52, 'GTFIIDPAAVIR': 87.23, 'LFLQFGAQGSPFLK': 100.00 }

Convert iRT to normalized RT

def irt_to_nrt(irt, gradient_length=60): '''Convert iRT to normalized RT (0-1 scale)''' return (irt + 24.92) / 124.92 # Scale to 0-1

Related Skills

  • dia-analysis - Use libraries in DIA workflows

  • peptide-identification - Generate search results for library building

  • data-import - Load MS data for library generation

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

bioskills

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-data-visualization-genome-tracks

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-epitranscriptomics-merip-preprocessing

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-data-visualization-multipanel-figures

No summary provided by upstream source.

Repository SourceNeeds Review