gatk HaplotypeCaller generates Excessive Heterozygous SNPs in inbred lines
I am using gatk Haplotypecaller to call SNPs in a maize line called Mo17. We got the raw reads from published data. Interestingly, the HaplotypeCaller yielded more than 80% of SNPs that are heterozygous. This is not possible with Mo17 as it is an inbred line. I used mostly the default parameters to call the SNPs. I used bwa mem for alignment.
I have checked g.vcf file, raw vcf file, and filtered vcf file. All of them containes 80-90% SNPs that are called heterozygous (0/1) but in an inbred line, most of them should be alternate homozygous genotypes (1/1).
Have anyone else faced this kind of problem? What am I doing wrong in the pipeline? This is the first time I am calling SNPs and I have not used any other softwares yet as most of the publications used gatk in their pipeline.
REQUIRED for all errors and issues:
a) GATK version used:4.3.0.0
b) Exact command used:
#Trimmomatic to remove adapter sequence
trimmomatic PE -threads 16 -phred33 ./Mo17_800_1.fastq ./Mo17_800_2.fastq ./trimmed/paried_Mo17_800_1.fastq ./trimmed/unpaired_Mo17_800_1.fastq ./trimmed/paired_Mo17_800_2.fastq ./trimmed/unpaired_Mo17_800_2.fastq ILLUMINACLIP:./trimmomatic-0.39/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:36
#bwa-mem for aligning with the reference genome and processing bam file:
bwa mem -t 16 Zm-B73-REFERENCE-NAM-5.0.fa ./trimmed/paried_Mo17_800_1.fastq ./trimmed/paired_Mo17_800_2.fastq > ./Mo17.sam
picard SortSam -I Mo17.bam -O Mo17_sorted.bam -SO coordinate
picard AddOrReplaceReadGroups -I ./Mo17_sorted.bam -O ./Mo17_rg_sorted.bam -RGID 4 -RGLB Mo17 -RGPL ILLUMINA -RGPU unit1 -RGSM Mo17
picard MarkDuplicates --REMOVE_DUPLICATES true --CREATE_INDEX true -I ./Mo17_rg_sorted.bam -O ./Mo17_dedup_sorted.bam --METRICS_FILE ./Mo17.txt
#Haplotypecaller
gatk HaplotypeCaller -I Mo17_dedup_sorted.bam -R /blue/meixiazhao/m.hasan1/B73_ref_genomeV5/raw_reads/Zm-B73-REFERENCE-NAM-5.0.fa -ERC GVCF -O Mo17.g.vcf
gatk --java-options "-Xmx8G" GenomicsDBImport \
-V ./Mo17.g.vcf \
--genomicsdb-workspace-path ./database/Mo17_chr1 \
--intervals chr1 --tmp-dir ./
#call_genotypes
gatk GenotypeGVCFs --max-alternate-alleles 30 -R Zm-B73-REFERENCE-NAM-5.0.fa
-V gendb://database/Mo17_chr1 -O Mo17_variants_chr1.vcf
#exclude_SNPs
gatk SelectVariants \
-R Zm-B73-REFERENCE-NAM-5.0.fa \
-V Mo17_variants_chr1.vcf \
--select-type-to-include SNP \
-O ./Mo17_SNPs_chr1.vcf
#variant_filtration
gatk VariantFiltration \
-R Zm-B73-REFERENCE-NAM-5.0.fa \
-V ./Mo17_SNPs_chr1.vcf \
-O ./filtered_Mo17_SNPs_chr1.vcf \
--filter-name "FS60" \
--filter-expression "FS > 60.0" \
--filter-name "MQ40" \
--filter-expression "MQ < 20.0" \
--filter-name "MQRankSum-12.5" \
--filter-expression "MQRankSum < -12.5" \
--filter-name "ReadPosRankSum-8" \
--filter-expression "ReadPosRankSum < -8.0" \
--filter-name "QD2" \
--filter-expression "QD <= 2.0"
#
vcftools --vcf ./filtered_Mo17_SNPs_chr1.vcf --maf 0.1 --max-missing 0.2 --minDP 1 --recode --recode-INFO-all --out ./recode_Mo17_SNPs_chr1
Please sign in to leave a comment.
0 comments