Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

User error: input files reference and features have incompatible contigs

0

8 comments

  • Official comment
    Avatar
    Bhanu Gandham

    Hi Jyotsana Mehra

    1. The fastest way to find what you are looking for is to look in our documentation first. We have generated an extensive list of documentation to help the community and most of your questions can be answered there. For example the resources you are looking for can be found in the resource bundle document here: https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle

    2. Another importance thing to do is to search the forum to see if your question has been answered already.

    Please read this article we created about how to find what you are looking for on our website: https://gatk.broadinstitute.org/hc/en-us/articles/360053424591

    Comment actions Permalink
  • Avatar
    David Benjamin

    mons7re Your reference is GRCh38, but your dbSNP (the "features") is for the b37 reference.  Assuming that your reads are aligned to the GRCh38 reference, you can fix the problem by using a GRCh38 (often this is called hg38) version of dbSNP.

    The GRCh38 reference is the successor to b37.  It differs from b37 mainly in terms of completeness -- fewer gaps in repetitive regions like telomeres and centromeres -- and also contains so-called "alt contigs", which. . . well, maybe that would be too much information overload for now.

    By the way, most of GATK developers did not come to the Broad Institute with any background in biology.  We all remember getting tripped up on things like this when we started.  It happens to everyone.

    3
    Comment actions Permalink
  • Avatar
    Jyotsana Mehra

    From where to get the reads?

    I have all the desired labels for the reference contigs but my read contigs list is empty.

    What am I missing?

    0
    Comment actions Permalink
  • 0
    Comment actions Permalink
  • Avatar
    Jyotsana Mehra

    Thanks, I looked into the issue, there is something wrong with the reference genome which I am using. Can you please provide a valid reference genome for the GATK pipeline?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Jyotsana Mehra, please look at our resource bundle page, where you will find information about the resources we provide:

    https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle

    0
    Comment actions Permalink
  • Avatar
    sarita 693

    Hi

    I'm  very new to the bioinformatics and trying to make gvcf from .BAM files and for performing the same I have used hg19.fa files from NCBI database . when I run the command then I face this error .  A USER ErROR has occurred: Input files reference and reads have incompatible contigs: No overlapping contigs found.
      reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chrX, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr20, chrY, chr19, chr22, chr21, chr6_ssto_hap7, chr6_mcf_hap5, chr6_cox_hap2, chr6_mann_hap4, chr6_apd_hap1, chr6_qbl_hap6, chr6_dbb_hap3, chr17_ctg5_hap1, chr4_ctg9_hap1, chr1_gl000192_random, chrUn_gl000225, chr4_gl000194_random, chr4_gl000193_random, chr9_gl000200_random, chrUn_gl000222, chrUn_gl000212, chr7_gl000195_random, chrUn_gl000223, chrUn_gl000224, chrUn_gl000219, chr17_gl000205_random, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chr9_gl000199_random, chrUn_gl000211, chrUn_gl000213, chrUn_gl000220, chrUn_gl000218, chr19_gl000209_random, chrUn_gl000221, chrUn_gl000214, chrUn_gl000228, chrUn_gl000227, chr1_gl000191_random, chr19_gl000208_random, chr9_gl000198_random, chr17_gl000204_random, chrUn_gl000233, chrUn_gl000237, chrUn_gl000230, chrUn_gl000242, chrUn_gl000243, chrUn_gl000241, chrUn_gl000236, chrUn_gl000240, chr17_gl000206_random, chrUn_gl000232, chrUn_gl000234, chr11_gl000202_random, chrUn_gl000238, chrUn_gl000244, chrUn_gl000248, chr8_gl000196_random, chrUn_gl000249, chrUn_gl000246, chr17_gl000203_random, chr8_gl000197_random, chrUn_gl000245, chrUn_gl000247, chr9_gl000201_random, chrUn_gl000235, chrUn_gl000239, chr21_gl000210_random, chrUn_gl000231, chrUn_gl000229, chrM, chrUn_gl000226, chr18_gl000207_random]
      reads contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT]

     

    Should I change the CONTIGS name in .BAM fileor change the reference genome file? Kindly reply.

     

    0
    Comment actions Permalink
  • Avatar
    David Roazen

    Hi sarita 693,

    This error indicates that your BAM file was aligned to a different reference than the one you are using. In this situation it's generally not safe to simply change the contig names -- instead, you should find out which reference your BAM file was aligned to, and use that reference instead. This information is sometimes present in the header of your BAM file, which you can view with "samtools view -H". If it's not there, you'll need to look into how your BAM files were created, and which reference was used during alignment.

    Hope this helps,

    David

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk