HISAT2 RNA-seq Alignment
Build Index
Basic index (no annotation)
hisat2-build -p 8 reference.fa hisat2_index
Index with splice sites and exons (recommended)
hisat2_extract_splice_sites.py annotation.gtf > splice_sites.txt hisat2_extract_exons.py annotation.gtf > exons.txt
hisat2-build -p 8
--ss splice_sites.txt
--exon exons.txt
reference.fa hisat2_index
Basic Alignment
Paired-end reads
hisat2 -p 8 -x hisat2_index
-1 reads_1.fq.gz -2 reads_2.fq.gz
-S aligned.sam
Single-end reads
hisat2 -p 8 -x hisat2_index
-U reads.fq.gz
-S aligned.sam
Direct to Sorted BAM
Pipe to samtools
hisat2 -p 8 -x hisat2_index
-1 r1.fq.gz -2 r2.fq.gz |
samtools sort -@ 4 -o aligned.sorted.bam -
samtools index aligned.sorted.bam
Stranded Libraries
Forward stranded (e.g., Ligation)
hisat2 -p 8 -x hisat2_index
--rna-strandness FR
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam
Reverse stranded (e.g., dUTP, TruSeq - most common)
hisat2 -p 8 -x hisat2_index
--rna-strandness RF
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam
Single-end stranded
hisat2 -p 8 -x hisat2_index
--rna-strandness F \ # or R for reverse
-U reads.fq.gz -S aligned.sam
Novel Splice Junction Discovery
Output novel splice junctions
hisat2 -p 8 -x hisat2_index
--novel-splicesite-outfile novel_splices.txt
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam
Use known + novel junctions for subsequent alignments
hisat2 -p 8 -x hisat2_index
--novel-splicesite-infile novel_splices.txt
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam
Two-Pass Alignment (Manual)
Pass 1: Discover junctions from all samples
for r1 in *_R1.fq.gz; do
r2=${r1/_R1/_R2}
base=$(basename $r1 _R1.fq.gz)
hisat2 -p 8 -x hisat2_index
--novel-splicesite-outfile ${base}_splices.txt
-1 $r1 -2 $r2 -S /dev/null
done
Combine and filter junctions
cat *_splices.txt | sort -u > combined_splices.txt
Pass 2: Realign with all junctions
for r1 in *_R1.fq.gz; do
r2=${r1/_R1/_R2}
base=$(basename $r1 _R1.fq.gz)
hisat2 -p 8 -x hisat2_index
--novel-splicesite-infile combined_splices.txt
-1 $r1 -2 $r2 |
samtools sort -@ 4 -o ${base}.sorted.bam -
done
Read Group Information
hisat2 -p 8 -x hisat2_index
--rg-id sample1
--rg SM:sample1
--rg PL:ILLUMINA
--rg LB:lib1
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam
Downstream Quantification
Output name-sorted BAM for htseq-count
hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz |
samtools sort -n -@ 4 -o aligned.namesorted.bam -
Or coordinate-sorted for featureCounts
hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz |
samtools sort -@ 4 -o aligned.sorted.bam -
Key Parameters
Parameter Default Description
-p 1 Number of threads
-x
Index basename
--rna-strandness unstranded FR/RF/F/R
--dta off Downstream transcriptome assembly
--dta-cufflinks off For Cufflinks
--min-intronlen 20 Minimum intron length
--max-intronlen 500000 Maximum intron length
-k 5 Max alignments to report
For StringTie/Cufflinks
Use --dta for StringTie
hisat2 -p 8 -x hisat2_index
--dta
-1 r1.fq.gz -2 r2.fq.gz |
samtools sort -@ 4 -o aligned.sorted.bam -
Alignment Summary
HISAT2 prints summary to stderr
hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz -S aligned.sam 2> summary.txt
Example:
50000000 reads; of these: 50000000 (100.00%) were paired; of these: 2500000 (5.00%) aligned concordantly 0 times 45000000 (90.00%) aligned concordantly exactly 1 time 2500000 (5.00%) aligned concordantly >1 times 95.00% overall alignment rate
Memory Comparison
Aligner Human Genome Memory
STAR ~30GB
HISAT2 ~8GB
Related Skills
-
read-alignment/star-alignment - Alternative with more features
-
rna-quantification/featurecounts-counting - Count aligned reads
-
rna-quantification/alignment-free-quant - Skip alignment entirely
-
differential-expression/deseq2-basics - Downstream DE analysis