Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

DP/AD in SNP greater than coverage in bamout?

0

20 comments

  • Avatar
    mk

    Hello,

    Isn't DP/AD being much greater than the coverage an abnormal result?

    Thank you

    1
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi mk

     

    The reason for my previous post was becuase this has been discussed in the forum in various threads already. For example: https://gatk.broadinstitute.org/hc/en-us/community/posts/360060670091--GATK-v4-1-6-0-Mutect2-bamout-reports-fewer-reads-than-VCF-AD

    0
    Comment actions Permalink
  • Avatar
    mk

    Thank you. In this post, the difference is 1 read. All other posts i have found discuss the AD being less in the VCF than the original BAM, or similar. In my case, the VCF is reporting AD almost two times as much as the coverage in the bamout: 2550 vs 4500. Here is an example of coverage from the bamout file, and AD from the VCF in green.

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi mk

     

    IGV downsamples bam files. So maybe the AD you are seeing in IGV is downsampled. Try running a coverage tool on the bam to get the real depth values.

    0
    Comment actions Permalink
  • Avatar
    mk

    Hi, thank you.


    What is shown in the images is a bigwig (coverage plot) of the bamout file (grey), with the AD from the VCF overlayed in green. There is no bam file here.

    Thank you.

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi mk

     

    Hmmm that is weird. AD VCF is reporting should not be two times as much as the coverage in the bamout.

    1. How is the coverage plot generated?
    2. Can you share a snippet of the bamout and vcf of the problematic region.

    Here you is how you can share you bamout and vcf files with us: https://gatk.zendesk.com/hc/en-us/articles/360035889671

    0
    Comment actions Permalink
  • Avatar
    mk

    Hello,

    1) The coverage plot is generated using deeptools bamCoverage

    2) I have shared the files in a tarball called bamout_vs_vcf_dp.tar.gz, the bamout.bam file was only 2 MB so I shared the entire file, there are a few examples of this happening in this file, the first 4 snps are examples.

    3) Also, all the coverage areas surrounding each snp share the same strange shape shown above, where the coverage increases in a linear fashion (is that normal?)


    Thank you very much for looking into this.

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi mk

    Can you please share the exact command you are using? In particular are you running with contamination downsampling argument? If you are then using this argument should help: --use-filtered-reads-for-annotations

    0
    Comment actions Permalink
  • Avatar
    mk

    Hello, apologies for the delay in my response, please see command below:

    bamCoverage \
    -p max \
    --bam $bam \
    --binSize 1 \
    --ignoreDuplicates \
    --minMappingQuality 20 \

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi mk

     

    Sorry I wasn't clear. I meant can you please share your HaplotypeCaller command you are using? In particular are you running with contamination downsampling argument? If you are then using this argument should help: --use-filtered-reads-for-annotations.

    0
    Comment actions Permalink
  • Avatar
    mk

    Hi, apologies, command is below. I am not running with contamination downsampling argument.

    gatk HaplotypeCaller \
    -R $ref \
    -I $bam \
    -O raw_variants.vcf \
    -bamout haplotypecaller_bamout.bam \
    -ploidy 1

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi mk

    We would like to debug this issue on our end. Can you please share the original bam? 

     

    We think that maybe, regardless of  -use-filtered-reads-for-annotations  some filtered reads contribute to DP and AD without appearing in the bamout. These include reads with length < 10 bases, reads with unmapped mates, and reads with mates mapping to a different contig.

     

    We would like to test out out our theory, so if you could please share you bam we will look into this further.

    0
    Comment actions Permalink
  • Avatar
    mk

    Hi, Thank you for looking into this. I have copied a file called mk_sample.bam to the ftp server.

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi mk

     

    We are looking into this and will get back to you shortly.

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    mk I'm preparing to investigate your question and I notice that the ploidy is 1.  What species is this and what reference is your bam aligned to?

    0
    Comment actions Permalink
  • Avatar
    mk

    Hello David,

    Thank you for looking into this. It's the NCBI SARS-CoV2 reference genome (NC_045512.2). Please let me know if you need any other info.

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    @mk Bhanu Gandham Genevieve Brandt (she/her) I am unable to reproduce this behavior in the current 4.1.8.1 release.  With your command line the DP/AD/bamout depth in IGV of the first several SNPs are

    167 G->A 3421/3364/3422

    241 C->T 4696/4638/4644

    1059 C->T 4938/4756/4803

    3037 C->T 3663/3609/3637

    Additionally, I only observe the sloped coverage in the first SNP, which is to be expected since it is within one fragment length of the end of a linear chromosome.  The bamout coverage mirrors that of the raw bam.  All other SNPs had flat coverage.

    @mk Please let us know if the issue persists in the latest release.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    mk were you able to reproduce this issue in the latest release?

    0
    Comment actions Permalink
  • Avatar
    mk

    I have traced the source of both issues: it is the coverage plot which is generated using deeptools bamCoverage that is causing the observed slope in coverage and reporting a different (lower) count for coverage. Apologies for the confusion and thank you very much for your help.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    mk Thank you for posting your solution so that it can be useful for other GATK users!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk