Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Mutect2 - issue with long scaffolds (non-human/mouse)



    David Benjamin

    Yuanyuan Cheng Out of curiosity, are you studying opossums?

    Try outputting in uncompressed vcf format: "-O normal1.vcf" instead of "-O normal1.vcf.gz".  The tabix (.tbi) format that compresses a bgzipped vcf is hard-coded to go only to 2^29 (that's 536 million) bases, while I believe the .idx format that compresses unzipped vcf has no such limitation.

    There exists a .csi index format that I believe the GATK can use as input, but here the problem is the index that the GATK generates on the fly, and the GATK has no capacity to emit a .csi index.

    See a related discussion here:

    Yuanyuan Cheng

    Many thanks David! That makes a lot of sense. I am rerunning it now using uncompressed vcf for output format.

    I study Tasmanian devils. Lots of marsupials seem to have super long chromosomes, which can cause unexpected trouble sometimes :)

