bio-population-genetics-linkage-disequilibrium

Linkage Disequilibrium

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bio-population-genetics-linkage-disequilibrium" with this command: npx skills add gptomics/bioskills/gptomics-bioskills-bio-population-genetics-linkage-disequilibrium

Linkage Disequilibrium

Calculate LD statistics, prune correlated variants, and identify haplotype blocks.

PLINK LD Calculations

Pairwise r²

All pairs within window

plink2 --bfile data --r2 --ld-window-kb 1000 --ld-window-r2 0.2 --out ld_results

With SNP names in output

plink2 --bfile data --r2 inter-chr --ld-window-r2 0 --out all_pairs

Squared correlation matrix

plink2 --bfile data --r2-phased square --out ld_matrix

Output Format

ld_results.ld contains:

CHR_A BP_A SNP_A CHR_B BP_B SNP_B R2

PLINK 1.9 Options

r² with D' statistics

plink --bfile data --r2 dprime --ld-window-kb 500 --out ld_with_dprime

Inter-chromosome LD

plink --bfile data --r2 inter-chr --ld-snp-list target_snps.txt --out target_ld

LD Pruning

Standard Pruning

Calculate pruning list

plink2 --bfile data --indep-pairwise 50 10 0.1 --out prune

Output files:

prune.prune.in - Variants to keep

prune.prune.out - Variants to remove

Extract pruned set

plink2 --bfile data --extract prune.prune.in --make-bed --out data_pruned

Pruning Parameters

Parameter Description Common Values

Window (50) Variants per window 50-200

Step (10) Variants to shift 5-50

r² threshold (0.1) Max LD allowed 0.1-0.5

Use Cases

Strict pruning for PCA/Admixture

plink2 --bfile data --indep-pairwise 50 10 0.1 --out strict_prune

Moderate pruning for polygenic scores

plink2 --bfile data --indep-pairwise 200 50 0.5 --out moderate_prune

Region-based pruning

plink2 --bfile data --indep-pairwise 50 10 0.2 --chr 6 --from-mb 25 --to-mb 35 --out mhc_prune

scikit-allel LD

Pairwise r²

import allel import numpy as np

callset = allel.read_vcf('data.vcf.gz') gt = allel.GenotypeArray(callset['calldata/GT']) pos = callset['variants/POS']

gn = gt.to_n_alt()

r2 = allel.rogers_huff_r(gn[:100]) ** 2

LD Decay

import allel import numpy as np import matplotlib.pyplot as plt

gn = gt.to_n_alt()

r2, dist = [], [] n_variants = min(1000, gn.shape[0])

for i in range(n_variants): for j in range(i + 1, min(i + 100, n_variants)): r = allel.rogers_huff_r(gn[[i, j]])[0, 1] ** 2 d = pos[j] - pos[i] r2.append(r) dist.append(d)

r2 = np.array(r2) dist = np.array(dist)

bins = np.arange(0, 100001, 1000) bin_means = [] for i in range(len(bins) - 1): mask = (dist >= bins[i]) & (dist < bins[i + 1]) if mask.sum() > 0: bin_means.append(np.mean(r2[mask])) else: bin_means.append(np.nan)

plt.figure(figsize=(10, 6)) plt.plot(bins[:-1] / 1000, bin_means) plt.xlabel('Distance (kb)') plt.ylabel('Mean r²') plt.title('LD Decay') plt.savefig('ld_decay.png')

Haplotype Blocks

PLINK

Identify haplotype blocks (Gabriel et al.)

plink --bfile data --blocks no-pheno-req --out blocks

Output: blocks.blocks (block boundaries)

Output: blocks.blocks.det (block details)

Block Statistics

import pandas as pd

blocks = pd.read_csv('blocks.blocks.det', sep='\s+')

print(f'Number of blocks: {len(blocks)}') print(f'Mean block size: {blocks["KB"].mean():.1f} kb') print(f'Mean SNPs per block: {blocks["NSNPS"].mean():.1f}')

LD Matrix Visualization

import allel import numpy as np import matplotlib.pyplot as plt

gn = gt.to_n_alt()[:200]

r = allel.rogers_huff_r(gn) r2_matrix = r ** 2

plt.figure(figsize=(10, 10)) plt.imshow(r2_matrix, cmap='hot', vmin=0, vmax=1) plt.colorbar(label='r²') plt.xlabel('Variant index') plt.ylabel('Variant index') plt.title('LD Matrix') plt.savefig('ld_matrix.png', dpi=150)

LD-based Clumping (GWAS)

Clump GWAS results by LD

plink --bfile data
--clump gwas_results.txt
--clump-p1 5e-8
--clump-p2 1e-5
--clump-r2 0.1
--clump-kb 250
--out clumped

Output: clumped.clumped (independent signals)

Clump Parameters

Parameter Description

--clump-p1 Index SNP p-value threshold

--clump-p2 Clumped SNP p-value threshold

--clump-r2 LD threshold for clumping

--clump-kb Physical distance threshold

vcftools LD

Pairwise LD for region

vcftools --vcf data.vcf --geno-r2 --ld-window-bp 100000 --out ld_results

Output: ld_results.geno.ld

Haplotype-based r²

vcftools --vcf data.vcf --hap-r2 --ld-window-bp 100000 --out hap_ld

Complete Workflow

1. Calculate genome-wide LD

plink2 --bfile data --r2 --ld-window-kb 500 --ld-window-r2 0.2 --out ld_genome

2. Generate pruned set for PCA

plink2 --bfile data --indep-pairwise 50 10 0.1 --out prune plink2 --bfile data --extract prune.prune.in --make-bed --out pruned

3. Identify haplotype blocks

plink --bfile data --blocks no-pheno-req --out blocks

4. Visualize LD for specific region

plink --bfile data --r2 dprime --chr 6 --from-mb 28 --to-mb 34 --out mhc_ld

Related Skills

  • plink-basics - File format handling

  • population-structure - Use pruned data for PCA

  • association-testing - LD clumping for GWAS

  • selection-statistics - LD affects EHH statistics

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

bioskills

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-data-visualization-genome-tracks

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-epitranscriptomics-merip-preprocessing

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-data-visualization-multipanel-figures

No summary provided by upstream source.

Repository SourceNeeds Review