Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Question about bam format errors and handling large contigs

0

2 comments

  • Avatar
    Chris Kachulis

    Hi Tiffany Kosch,

    The requirement of chromosomes being less than 2^31 bases is required for bams.  The requirement of being less than 2^29 bases (around 512 million base pairs) is to be able to create a bai index.  So you should be able to use these chromosomes "as is", but you will need to use csi indices (it looks like you may have already figured this out based on your samtools calls).

    Many of the issues you are seeing may be related to the very old version of picard you are using.  CSI indices were not supported until htsjdk 2.19.0.  I would suggest updating to a much newer version of picard (3.0.0 if you are ok moving to java 17, or 2.27.5 if not) and seeing if that fixes the issues. 

    0
    Comment actions Permalink
  • Avatar
    Tiffany Kosch

    Hello Chris Kachulis,

    Thanks very much for your response and for answering my question about chromosome size limits. 

    I just ran a test with a newer version of Picard and it started up great so I think the Picard version was what was causing the issue. 

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk