Haploytpe caller shows me that 0 read(s) were filtered by: MappingQualityAvailableReadFilter etc.
AnsweredHello,
I am using GATK v4.2.6.1. I am working on a haploid organism. However, I noticed the following program log:
19:19:24.445 INFO HaplotypeCaller - 638650 read(s) filtered by: MappingQualityReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
142605 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
0 read(s) filtered by: GoodCigarReadFilter
0 read(s) filtered by: WellformedReadFilter
781255 total reads filtered
19:19:24.450 INFO ProgressMeter - Pf3D7_MIT_v3:4710 60.2 138386 2300.6
19:19:24.452 INFO ProgressMeter - Traversal complete. Processed 138386 total regions in 60.2 minutes.
19:19:24.541 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 4.655814166
19:19:24.541 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 1571.786011251
19:19:24.541 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 1047.67 sec
19:19:24.542 INFO HaplotypeCaller - Shutting down engine
[July 7, 2022 8:22:28 PM CEST] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 63.02 minutes.
Runtime.totalMemory()=2076049408
I am now wondering if this is normal or is there something wrong with my input bam files?
My command line is the following
module load GATK/4.2.6.1-foss-2018b-Java-1.8
module load Java
mkdir variant_calling_3D7
# command line
gatk HaplotypeCaller -R reference_genome/PlasmoDB-57_Pfalciparum3D7_Genome.fasta -I mapped_files_3D7/WT_paired_sorted_addedRG_md_recal.bam -ERC GVCF -O variant_calling_3D7/WT_paired.g.vcf --minimum-mapping-quality 10
gatk HaplotypeCaller -R reference_genome/PlasmoDB-57_Pfalciparum3D7_Genome.fasta -I mapped_files_3D7/R1_paired_sorted_addedRG_md_recal.bam -ERC GVCF -O variant_calling_3D7/R1_paired.g.vcf --minimum-mapping-quality 10
gatk HaplotypeCaller -R reference_genome/PlasmoDB-57_Pfalciparum3D7_Genome.fasta -I mapped_files_3D7/R2_paired_sorted_addedRG_md_recal.bam -ERC GVCF -O variant_calling_3D7/R2_paired.g.vcf --minimum-mapping-quality 10
gatk HaplotypeCaller -R reference_genome/PlasmoDB-57_Pfalciparum3D7_Genome.fasta -I mapped_files_3D7/R3_paired_sorted_addedRG_md_recal.bam -ERC GVCF -O variant_calling_3D7/R3_paired.g.vcf --minimum-mapping-quality 10
Thanks in advance and I hope anyone could help!
-
Hi Anthony DiCi,
Oh I see, thanks a lot for looking into it! All is clear from my side
Best,
Patricia
-
Hi pb,
Thank you for writing to the GATK forum! We hope that we can help you clarify this question.
High MappingQualityReadFilter results don't necessarily indicate a problem with GATK. Sometimes a lot of poorly mapped reads are related to the reference/organism you are using.
Could you please first run CountReads on your BAM file and compare the result to the filtering result you included above? From there we'll be better equipped to determine the source of your initial result.
Best,
Anthony
-
Hi Anthony,
Thank you very much for the swift response from your side. I ran the CountReads on my BAM file and this was the output (see below).
From this documentations, WellformedReadFilter – GATK (broadinstitute.org), does this then indicate that none of my reads were removed by the filter set?
09:51:25.276 INFO CountReads - ------------------------------------------------------------
09:51:25.277 INFO CountReads - HTSJDK Version: 2.24.1
09:51:25.277 INFO CountReads - Picard Version: 2.27.1
09:51:25.277 INFO CountReads - Built for Spark Version: 2.4.5
09:51:25.277 INFO CountReads - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:51:25.277 INFO CountReads - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:51:25.277 INFO CountReads - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:51:25.277 INFO CountReads - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:51:25.277 INFO CountReads - Deflater: IntelDeflater
09:51:25.277 INFO CountReads - Inflater: IntelInflater
09:51:25.278 INFO CountReads - GCS max retries/reopens: 20
09:51:25.278 INFO CountReads - Requester pays: disabled
09:51:25.278 INFO CountReads - Initializing engine
09:51:25.679 INFO CountReads - Done initializing engine
09:51:25.679 INFO ProgressMeter - Starting traversal
09:51:25.679 INFO ProgressMeter - Current Locus Elapsed Minutes Reads Processed Reads/Minute
09:51:35.681 INFO ProgressMeter - Pf3D7_08_v3:699293 0.2 4902000 29409059.1
09:51:45.681 INFO ProgressMeter - Pf3D7_13_v3:1479591 0.3 10406000 31216439.2
09:51:50.862 WARN IntelInflater - Zero Bytes Written : 0
09:51:50.865 INFO CountReads - 0 read(s) filtered by: WellformedReadFilter
09:51:50.865 INFO ProgressMeter - Pf3D7_MIT_v3:5934 0.4 13303080 31691606.4
09:51:50.866 INFO ProgressMeter - Traversal complete. Processed 13303080 total reads in 0.4 minutes.
09:51:50.866 INFO CountReads - CountReads counted 13303080 total reads
09:51:50.866 INFO CountReads - Shutting down engine
[July 15, 2022 9:51:50 AM CEST] org.broadinstitute.hellbender.tools.CountReads done. Elapsed time: 0.43 minutes.
Runtime.totalMemory()=2076049408 -
Hi pb,
Thank you for getting back with your CountReads output! All looks perfectly normal to me. A filter output of 600,000 poorly mapped reads out of 13,000,000 is not out of the ordinary in this context.
We generally expect a WellformedReadFilter output of zero, as it only checks for totally illegal/invalid reads that break the spec/fail quality checks.
I hope this helps clear up any doubts. Thank you again for submitting this question to our GATK forum. Please feel free to reach out with any further questions in the future!
Best,
Anthony
Please sign in to leave a comment.
4 comments