DP for each variant bulk is skewed towards 50 and multiples of 50
When using the GATK pipeline to call variants, I end up obtaining a DP for each variant skewed towards 50 and multiples of 50 (see photo below) when my average read depth from my sequencing files should be much less than 50. What could be causing this bias and how might I fix it?
If you are seeing an error, please provide(REQUIRED) :
a) GATK version used: gatk-220.127.116.11
b) Exact command used: I am not sure which command in my script is causing the bias.
c) Entire error log: N/A
If not an error, choose a category for your question(REQUIRED):
a)How do I (......)?
b) What does (......) mean?
c) Why do I see (......)?
d) Where do I find (......)?
e) Will (......) be in future releases?
Hi Lauren Fedenia,
Do you have multiple samples? DP is the unfiltered depth of coverage across all samples.
You can read more about coverage here: https://gatk.broadinstitute.org/hc/en-us/articles/360035532112-Coverage-Read-depth-metrics
And the DP calculation: https://gatk.broadinstitute.org/hc/en-us/articles/360057440391-Coverage
Yes, we are concatenating 70 fastq files/samples per bulk pool to perform variant calling on. When I we analyze the depth and coverage outside of the GATK pipeline, we do not obtain a skewed read depth towards 50 or multiples of 50. Is there a filtering step in the GATK pipeline that might be causing this?
What steps are you running in your GATK pipeline? There are multiple read filters applied with most GATK Tools, for example with HaplotypeCaller, these filters are applied:
java -jar picard.jar FastqToSam F1=Red_sorghum_bulk_FseI_130bp_seqs.txt.trimmed_seqs.fastq O=red-bulk_unaligned_reads.bam READ_GROUP_NAME=@D00572.1 SAMPLE_NAME=RedBulk LIBRARY_NAME=Illumina-111111 PLATFORM_UNIT=D00572:40:CAL35ANXX.1 PLATFORM=Illumina
java -jar picard.jar MarkIlluminaAdapters I=red-bulk_unaligned_reads.bam O=red-bulk_markilluminaadapters.bam M=red-bulk_markilluminaadapters_metrics.txt
java -Xmx48G -jar picard.jar SamToFastq I=red-bulk_markilluminaadapters.bam FASTQ=red-bulk_samtofastq.fq CLIPPING_ATTRIBUTE=XT CLIPPING_ACTION=2 NON_PF=true
../scripts/Read_Mapping_softwares/bwa/bwa mem -M -t 40 ../scripts/Read_Mapping_softwares/ref_bwa/Sbicolor_454_v3.0.1.fa red-bulk_samtofastq.fq > red-bulk_bwa_mem.sam
java -Xmx16G -jar picard.jar MergeBamAlignment R=../scripts/Read_Mapping_softwares/ref_bwa/Sbicolor_454_v3.0.1.fa UNMAPPED_BAM=red-bulk_unaligned_reads.bam ALIGNED_BAM=red-bulk_bwa_mem.sam O=red_bulk_mergebamalignment.bam CREATE_INDEX=true CLIP_ADAPTERS=false INCLUDE_SECONDARY_ALIGNMENTS=true MAX_INSERTIONS_OR_DELETIONS=-1 PRIMARY_ALIGNMENT_STRATEGY=BestMapq ATTRIBUTES_TO_RETAIN=XS PAIRED_RUN=false CLIP_OVERLAPPING_READS=true
./gatk --java-options "-Xmx40G" HaplotypeCaller -DF NotDuplicateReadFilter -R ../../scripts/Read_Mapping_softwares/ref_bwa/Sbicolor_454_v3.0.1.fa -I ../red_bulk_mergebamalignment.bam -ERC GVCF -O red_bulk_output.raw.snps.indels.g.vcf
All steps above are repeated for the other bulk file.
Lauren Fedenia, what is the expected depth of coverage for your samples? Here is the GATK Tool DepthofCoverage if you want to check: https://gatk.broadinstitute.org/hc/en-us/articles/360056970332-DepthOfCoverage-BETA-
The expected depth of coverage is 40 for my samples.
Hi Lauren Fedenia,
Could you clarify what is being shown on the X and Y axis of the plot?
Please sign in to leave a comment.