Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Question GetPileupSummaries, columns all 0

0

7 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Brian Wiley, could you give some more information about this issue? What is the expected output, could you also include the stack trace and an example of the incorrect output?

    https://gatk.broadinstitute.org/hc/en-us/articles/360053424571-How-to-Write-a-Post

    0
    Comment actions Permalink
  • Avatar
    Brian Wiley

    Hi Genevieve Brandt (she/her),

    Here is an example of an output when I ran for above command:

    BAM:

    671723 147 chr17 1352 42 76M = 1272 -156 AAAAAAGTTTGGGGGGATTCCCCTAAGCCCGCCACCCGGAGACAGCGGATTTCCTTAGTTACTTACTATGCTCCTT
    JJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFFAA MD:Z:76 PG:Z:MarkDuplicates RG:Z:Sample.1 XG:i:0 NM:i:0 XM
    :i:0 XN:i:0 XO:i:0 AS:i:0 YS:i:0 YT:Z:CP
    13179887 1171 chr17 1352 42 76M = 1272 -156 AAAAAAGTTTGGGGGGATTCCCCTAAGCCCGCCACCCGGAGACAGCGGATTTCCTTAGTTACTTACTATG
    CTCCTN <F-JJAFFJJJJJFA-FA7FJFJA-7AJJJJJJJJJFJJJJJJJJJAFJAAFJJJJJJJJJJJAJJFF7FJFFAA# MD:Z:75T0 PG:Z:MarkDuplicates RG:Z:Sample.1 XG:i
    :0 NM:i:1 XM:i:1 XN:i:0 XO:i:0 AS:i:-1 YS:i:-1 YT:Z:CP

    These reads start at 1352 and are 76 bps long.  Should there be values of 2 in the pileups under ref_count since CIGARS are 76M and both overlap 1389 and 1397?

    Pileup (why are columns 3-5 zero?)

    contig position ref_count alt_count other_alt_count allele_frequency
    chr17 1389 0 0 0 0.074
    chr17 1397 0 0 0 0.04

    Maybe I am not sure exactly how GetPileupSummaries works, but I assumed it was the summary of ref match and alt counts that samtools mpileups does?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for that information, it is helpful in figuring this out. 

    Could you check your Bam file for any issues with ValidateSamFile? Here is a tutorial on how to troubleshoot the output.

    Thanks,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Brian Wiley

    Thanks.  Just ran it and no errors found.  I'll see what happens if I run on other datasets I guess later.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Ok, that will be helpful. Let me know what you find and we can keep troubleshooting.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Brian Wiley,

    I just wanted to follow up with some extra feedback about what could be causing this issue. This tool is a very straightforward tool with no known bugs, so most likely this is caused by no coverage at the regions you are looking at. If you have input reads that you are sure are in the range of your intervals, they may also be getting filtered out. There are some default read filters with this tool. You can check out the stack trace for more information about the read filters applied.

    Hope this helps.

    Best,

    Genevieve

     

    0
    Comment actions Permalink
  • Avatar
    Steven Strong

    Hi Brian-

    Did you ever resolve this issue? I'm running in to the same problem, and I have verified that there is coverage and that no reads are filtered.

    0
    Comment actions Permalink

Post is closed for comments.

Powered by Zendesk