GATK: GATK4 toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. The content on this page is borrowed from GATK webpages/courses. To get familiar with GATK tools, you can read the following:
docker pull broadinstitute/gatk:latest
or with some specific-version information
docker pull broadinstitute/gatk:4.0.11.0
docker run -it broadinstitute/gatk:latest
./gatk --list
gatk ToolName [tool args]
ctrl+p then ctrl+q
You can download toy example dataset from CSC’s allas objects storage:
wget https://a3s.fi/Softwares/data.zip
docker run -v /path/data:/gatk/data -it broadinstitute/gatk:latest
gatk HaplotypeCaller --help
gatk HaplotypeCaller -R /gatk/data/ref/ref.fasta -I data/bams/mother.bam \
-O /gatk/data/sandbox/variants.vcf
Note: Add JVM options to the command if you run into memory issues
gatk --java-options "-Xmx4G" HaplotypeCaller \
-R /gatk/data/ref/ref.fasta -I /gatk/data/bams/mother.bam \
-O /gatk/data/sandbox/variants.vcf
gatk HaplotypeCaller -R /gatk/data/ref/ref.fasta -I /gatk/data/bams/mother.bam -O /gatk/data/sandbox/mother.g.vcf -ERC GVCF
gatk HaplotypeCaller -R /gatk/data/ref/ref.fasta -I /gatk/data/bams/father.bam -O /gatk/data/sandbox/father.g.vcf -ERC GVCF
gatk HaplotypeCaller -R /gatk/data/ref/ref.fasta -I /gatk/data/bams/son.bam -O /gatk/data/sandbox/son.g.vcf -ERC GVCF
gatk GenomicsDBImport -V /gatk/data/sandbox/mother.g.vcf \
-V /gatk/data/sandbox/father.g.vcf \
-V /gatk/data/sandbox/son.g.vcf --genomicsdb-workspace-path \
/gatk/data/sandbox/trio.gdb_workspace --intervals 20
gatk CombineGVCFs -R /gatk/data/ref/ref.fasta \
-V /gatk/data/sandbox/father.g.vcf \
-V /gatk/data/sandbox/mother.g.vcf -V /gatk/data/sandbox/son.g.vcf \
-O /gatk/data/sandbox/combine_trio_variants.vcf
gatk GenotypeGVCFs -R /gatk/data/ref/ref.fasta \
-V gendb://data/sandbox/trio.gdb_workspace \
-G StandardAnnotation -O /gatk/data/sandbox/trio_variants.vcf