Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

(How to) Call somatic mutations using GATK4 Mutect2 Follow

9 comments

  • Avatar
    Nickier

    Excuse me, is there a tool to obtain the contamination.table file in the --contamination-table contamination.table parameter of the FilterMutectCalls step?

    0
    Comment actions Permalink
  • Avatar
    Brian Haas

    Can I please confirm that this is the logic being used in the current PoN creation step:

    ====

    Any candidate PoN site must occur in at least 2 samples (as per the default for --min-sample-count).

    If the site is not in gnomad (or at negligible frequency), it's automatically treated as a candidate site for PoN inclusion.

    If a site IS in gnomad at a non-negligible frequency, then it computes the probability of it being a germline variant.  If the p(germline) < 0.5, then it's a candidate. (this 0.5 is manually tunable via cmd line parameter: --max-germline-probability

    ====

    and if this is true, can we update the code documentation at:
    https://github.com/broadinstitute/gatk/blob/master/src/main/java/org/broadinstitute/hellbender/tools/walkers/mutect/CreateSomaticPanelOfNormals.java

    to reflect that germline variants are not intended to be captured now during the PoN creation step?

     

    0
    Comment actions Permalink
  • Avatar
    Brian Haas

    Also, can you please confirm that the panel of normals is excluding variants based on chromosome position and not requiring allele-specific matching?   ie.   a G->A variant in the tumor sample may be filtered if there's a G->T entry at that chromosomal position in the panel of normals.  I've seen examples of this and just want to confirm that this is the expected behavior.  Thx in advance.

    0
    Comment actions Permalink
  • Avatar
    Nickier

    Hi, I just ran the GenomicsDBImport to my exome samples (17 Germline samples). it has run 3 days and create some tmp file occupying 1.5 Tb and its still on chr4! Is there any solution

     

    0
    Comment actions Permalink
  • Avatar
    Nickier

    Hi, when I run GetPileupSummaries , I get this error: 

    Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.BitSet.initWords(BitSet.java:166)
    at java.util.BitSet.<init>(BitSet.java:161)
    at htsjdk.samtools.GenomicIndexUtil.regionToBins(GenomicIndexUtil.java:164)
    at htsjdk.samtools.BinningIndexContent.getChunksOverlapping(BinningIndexContent.java:121)
    at htsjdk.samtools.CachingBAMFileIndex.getSpanOverlapping(CachingBAMFileIndex.java:75)
    at htsjdk.samtools.BAMFileReader.getFileSpan(BAMFileReader.java:935)
    at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:952)
    at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:612)
    at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:533)
    at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:405)
    at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.loadNextIterator(SamReaderQueryingIterator.java:125)
    at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.<init>(SamReaderQueryingIterator.java:66)
    at org.broadinstitute.hellbender.engine.ReadsDataSource.prepareIteratorsForTraversal(ReadsDataSource.java:416)
    at org.broadinstitute.hellbender.engine.ReadsDataSource.iterator(ReadsDataSource.java:342)
    at java.lang.Iterable.spliterator(Iterable.java:101)

    How to solve this problem

    0
    Comment actions Permalink
  • Avatar
    Enrico Cocchi

    Hi Nickier, I had the same problem and figured out it's basically related to how contigs are passed from the BED file. If the BED (or interval_list as they call it) has a lot of different little intervals on every contig/chr you are analyzing, you'll see in the first lines of GenomicsDBImport output a warning about that. The solution is to insert --merge-input-interval in the GenomicsDBImport command

    1
    Comment actions Permalink
  • Avatar
    Mateo Kee

    Hello, I have a question regarding the `-tumor-segmentation` option in CalculateContamination:

        gatk CalculateContamination \
            -I getpileupsummaries.table \
            -tumor-segmentation segments.table \
            -O calculatecontamination.table
    

    - gatk website states it "output table containing segmentation of the tumor by minor allele fraction"

    - does this mean modeling possible mixture of 2 samples?

    - this wasn't done in the tutorial for the last version (https://gatk.broadinstitute.org/hc/en-us/articles/360035889791?id=11136), where below is the command:

    gatk CalculateContamination \
        -I 7_tumor_getpileupsummaries.table \
        -O 8_tumor_calculatecontamination.table

    - is `-tumor-segmentation` a recommended option now? how does it help in calculating contamination?

     

    another related question, what does the [ ] (square brackets) mean here? are they optional?

        gatk FilterMutectCalls -V unfiltered.vcf \
            [--tumor-segmentation segments.table] \
            [--contamination-table contamination.table] \
            --ob-priors read-orientation-model.tar.gz \
            -O filtered.vcf

    - if I run the above FilterMutectCalls, does it take care of both *cross-sample contamination* filtering and *orientation bias* filtering?

    0
    Comment actions Permalink
  • Avatar
    Nickier

    Hi, Enrico Cocchi , you are right. It makes effect when I add the argument --merge-input-interval . Thanks a lot.

    0
    Comment actions Permalink
  • when will gatk support calling complex variants, for example variants in EGFR,I do not know after so many versions of updating of gatk, this requirement is still not satisfied

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk