Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 parallelize input

0

4 comments

  • 0
    Comment actions Permalink
  • Avatar
    David Benjamin

    As Bhanu said, interval files various formats are valid for the -L argument.  The "chr1" naming convention is fine — all that matters is that your reads, reference, and other inputs have the same contig names.

    Your example using --intervals-set-rule INTERSECTION will work, but the standard practice is to use SplitIntervals to subdivide your $CAPTURE_INTERVALS_FILE into multiple intervals files (that is, multiple files, each with multiple intervals, and the union of these files' intervals is the original intervals), then run Mutect2 with these subdivided interval files.

    0
    Comment actions Permalink
  • Avatar
    alanhoyle

    I've made a workflow that divides the regions using SplitIntervals --subdivision-mode BALANCING_WITHOUT_INTERVAL_SUBDIVISION

    and I think I have it working. I'm able to reconstruct the original intervals file exactly.  

    It should be expected that there will be some small differences in the variant calling, merged stats, and F1R2 artifact tables,  right?  

    in a "toy" sample I made with only reads which fall into one SplitInterval, the results are byte-for-byte identical with the exception that the MergeVcfs seems to pick one of the ##GATKCommandLine lines at random, but in the simulated case with whole-exome data, we have some small differences in the calls and read counts are sometimes off by one.  

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    alanhoyle Those kind of small differences are expected as the consequence of bounding assembly regions differently.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk