Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 - issue with long scaffolds (non-human/mouse)

0

2 comments

  • Avatar
    David Benjamin

    Yuanyuan Cheng Out of curiosity, are you studying opossums?

    Try outputting in uncompressed vcf format: "-O normal1.vcf" instead of "-O normal1.vcf.gz".  The tabix (.tbi) format that compresses a bgzipped vcf is hard-coded to go only to 2^29 (that's 536 million) bases, while I believe the .idx format that compresses unzipped vcf has no such limitation.

    There exists a .csi index format that I believe the GATK can use as input, but here the problem is the index that the GATK generates on the fly, and the GATK has no capacity to emit a .csi index.

    See a related discussion here: https://github.com/broadinstitute/gatk/issues/6110.

    0
    Comment actions Permalink
  • Avatar
    Yuanyuan Cheng

    Many thanks David! That makes a lot of sense. I am rerunning it now using uncompressed vcf for output format.

    I study Tasmanian devils. Lots of marsupials seem to have super long chromosomes, which can cause unexpected trouble sometimes :)

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk