Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

A USER ERROR has occurred: Input files reference and reads have incompatible contigs: Found contigs with the same name but different lengths:

0

5 comments

  • Avatar
    Danielle Becker

    This is calling variants from a de novo transcriptome RNAseq data

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Danielle Becker

    It looks like the bam is mapped onto a reference genome that is different from what you are using to call variants. You need to make sure that you use the same reference genome that your reads are aligned to. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Danielle Becker

    Hi, thank you for your response. I re did this with the correct reference and got the same errors. I am using a de novo transcriptome as the reference not a genome, can GATK work with them?

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again.

    Yes GATK can work with any reference sequence you provide to it (given that contig sizes are less than 2^29-1 for proper indexing of output files). 

    Can you compare the contig lengths in your bam header with the sequence dictionary of the genome you are using to call variants?

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Can Kockan

    Quick note, I saw the following in the logs:

    picard.PicardException: /data/putnamlab/dbecks/DeNovo_transcriptome/2023_A.pul/output/stranded_output/trinity_out_dir.Trinity.dict already exists.  Delete this file and try again, or specify a different output file.

    As Gokalp said, the best way to check whether this might be the cause would be to compare this sequence dictionary with the bam header. There might be a chance that even though you've switched to the correct reference (whether genome or transcriptome), the dictionary might correspond to the old reference and CreateSequenceDictionary is unable to overwrite unless you manually delete the old one first.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk