bio-population-genetics-plink-basics

File formats, conversion, and quality control filtering with PLINK 1.9 and 2.0.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bio-population-genetics-plink-basics" with this command: npx skills add gptomics/bioskills/gptomics-bioskills-bio-population-genetics-plink-basics

PLINK Basics

File formats, conversion, and quality control filtering with PLINK 1.9 and 2.0.

File Formats

Binary Format (Recommended)

File Contents

.bed

Binary genotype data

.bim

Variant information (chr, ID, cM, pos, A1, A2)

.fam

Sample information (FID, IID, father, mother, sex, pheno)

PLINK 2.0 Format

File Contents

.pgen

Binary genotype data (compressed)

.pvar

Variant information

.psam

Sample information

Text Format (Legacy)

File Contents

.ped

Genotypes (FID, IID, father, mother, sex, pheno, genotypes)

.map

Variant positions (chr, ID, cM, pos)

Format Conversion

VCF to PLINK Binary

PLINK 1.9

plink --vcf input.vcf.gz --make-bed --out output

PLINK 2.0

plink2 --vcf input.vcf.gz --make-bed --out output

With sample ID handling

plink2 --vcf input.vcf.gz --double-id --make-bed --out output

PLINK Binary to VCF

PLINK 1.9

plink --bfile input --recode vcf --out output

PLINK 2.0

plink2 --bfile input --export vcf --out output

Compressed VCF

plink2 --bfile input --export vcf bgz --out output

PED/MAP to Binary (PLINK 1.9 Only)

PLINK 1.9 (PLINK 2.0 doesn't support .ped/.map directly)

plink --file input --make-bed --out output

Binary to PED/MAP

PLINK 1.9

plink --bfile input --recode --out output

PLINK 2.0

plink2 --bfile input --export ped --out output

PLINK 1.9 to 2.0 Format

Convert to PGEN format

plink2 --bfile input --make-pgen --out output

Convert back to BED

plink2 --pfile input --make-bed --out output

Quality Control Filtering

MAF Filter (Minor Allele Frequency)

Remove variants with MAF < 0.01

plink --bfile input --maf 0.01 --make-bed --out output

PLINK 2.0

plink2 --bfile input --maf 0.01 --make-bed --out output

Remove rare variants (MAF < 0.05)

plink2 --bfile input --maf 0.05 --make-bed --out output

Genotyping Rate Filters

Per-variant missing rate (remove if >5% missing)

plink2 --bfile input --geno 0.05 --make-bed --out output

Per-sample missing rate (remove if >5% missing)

plink2 --bfile input --mind 0.05 --make-bed --out output

Hardy-Weinberg Equilibrium Filter

Remove variants with HWE p-value < 1e-6

plink2 --bfile input --hwe 1e-6 --make-bed --out output

Different threshold for cases vs controls

plink2 --bfile input --hwe 1e-6 --hwe-all --make-bed --out output

Combined QC Pipeline

Standard QC filtering

plink2 --bfile input
--maf 0.01
--geno 0.05
--mind 0.05
--hwe 1e-6
--make-bed --out qc_filtered

Sample and Variant Selection

Keep/Remove Samples

Keep specific samples (samples.txt: FID IID per line)

plink2 --bfile input --keep samples.txt --make-bed --out output

Remove specific samples

plink2 --bfile input --remove samples.txt --make-bed --out output

Keep single sample

plink2 --bfile input --keep-fam sample_id --make-bed --out output

Extract/Exclude Variants

Extract specific variants (variants.txt: variant IDs)

plink2 --bfile input --extract variants.txt --make-bed --out output

Exclude specific variants

plink2 --bfile input --exclude variants.txt --make-bed --out output

Extract by range

plink2 --bfile input --extract range chr1:1000000-2000000 --make-bed --out output

Chromosome Selection

Single chromosome

plink2 --bfile input --chr 22 --make-bed --out chr22

Multiple chromosomes

plink2 --bfile input --chr 1-22 --make-bed --out autosomes

Exclude chromosome

plink2 --bfile input --not-chr 23,24,25,26 --make-bed --out autosomes

Allele Frequency

PLINK 1.9 (MAF-based)

plink --bfile input --freq --out output

PLINK 2.0 (ALT allele frequency - not MAF!)

plink2 --bfile input --freq --out output

PLINK 2.0 with MAF

plink2 --bfile input --freq cols=+mac,+mafreq --out output

Missing Data Statistics

Per-sample and per-variant missing rates

plink2 --bfile input --missing --out output

Output files:

output.smiss - sample missing rates

output.vmiss - variant missing rates

Sex Check

Verify reported sex matches X chromosome heterozygosity.

PLINK 1.9

plink --bfile input --check-sex --out sex_check

PLINK 2.0

plink2 --bfile input --split-par hg38 --check-sex --out sex_check

Interpret Results

import pandas as pd

sex = pd.read_csv('sex_check.sexcheck', sep='\s+')

problems = sex[sex['STATUS'] == 'PROBLEM'] print(f'Sex mismatches: {len(problems)}')

F statistic: <0.2 = female, >0.8 = male, between = ambiguous

PEDSEX: reported sex (1=male, 2=female, 0=unknown)

SNPSEX: inferred sex (1=male, 2=female, 0=undetermined)

Update or Remove

Update sex from check results

plink2 --bfile input --update-sex sex_check.sexcheck col-num=4 --make-bed --out updated

Remove sex mismatches

awk '$5 == "PROBLEM" {print $1, $2}' sex_check.sexcheck > sex_problems.txt plink2 --bfile input --remove sex_problems.txt --make-bed --out output

Sample Information

Update Phenotypes

phenotypes.txt: FID IID pheno (1=control, 2=case, -9=missing)

plink2 --bfile input --pheno phenotypes.txt --make-bed --out output

Quantitative phenotype

plink2 --bfile input --pheno phenotypes.txt --make-bed --out output

Update Sex

sex.txt: FID IID sex (1=male, 2=female, 0=unknown)

plink2 --bfile input --update-sex sex.txt --make-bed --out output

Update Sample IDs

ids.txt: old_FID old_IID new_FID new_IID

plink2 --bfile input --update-ids ids.txt --make-bed --out output

Merging Datasets

Merge two datasets (PLINK 1.9)

plink --bfile data1 --bmerge data2.bed data2.bim data2.fam --make-bed --out merged

Merge list of datasets

plink --bfile data1 --merge-list merge_list.txt --make-bed --out merged

merge_list.txt contains: data2.bed data2.bim data2.fam (one set per line)

Handle strand flips

plink --bfile data1 --bmerge data2 --make-bed --out merged

If error: plink --bfile data2 --flip missnps.txt --make-bed --out data2_flipped

Variant Information

Set Variant IDs

Set ID based on position

plink2 --bfile input --set-all-var-ids @:#:$r:$a --make-bed --out output

Format: chr:pos:ref:alt

Update Variant Names

update.txt: old_id new_id

plink2 --bfile input --update-name update.txt --make-bed --out output

PLINK 2.0 vs 1.9 Summary

Feature PLINK 2.0 PLINK 1.9

Status Current Legacy

Command plink2

plink

Format .pgen/.pvar/.psam

.bed/.bim/.fam

Speed Faster Baseline

Memory More efficient Higher for large data

Export VCF --export vcf

--recode vcf

Frequency output ALT frequency MAF

Missing output .smiss/.vmiss

.imiss/.lmiss

PED/MAP support No (convert via 1.9) Yes (--file )

Related Skills

  • association-testing - GWAS with filtered data

  • population-structure - PCA after QC

  • variant-calling/vcf-basics - VCF format before conversion

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

bioskills

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-data-visualization-genome-tracks

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-epitranscriptomics-merip-preprocessing

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-data-visualization-multipanel-figures

No summary provided by upstream source.

Repository SourceNeeds Review