Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Errors about input files having missing or incompatible contigs Follow

8 comments

  • Avatar
    Field -Ye Tian

    Dear GATK group,

    I have encounter a similar (if not the same) problem running the Mutect2 program. 

    It says "A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found.

    reference contigs = [NC_000001.11, NT_187361.1, NT_187362.1, NT_187363.1, NT_187364.1, NT_187365.1, NT_187366.1, NT_187367.1, NT_187368.1, NT_187369.1, NC_000002.12, NT_187370.1,.........(where I omitted many more items)

    features contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM, chr1_KI270706v1_random, chr1_KI270707v1_random, chr1_KI270708v1_random, chr1_KI270709v1_random, chr1_KI270710v1_random, chr1_KI270711v1_random, chr1_KI270712v1_random,........"

     

    I figured that the mismatch is between the ref and VCF files (1000g_pon.hg38.vcf.gz and somatic-hg38_af-only-gnomad.hg38.vcf from https://console.cloud.google.com/storage/browser/gatk-best-practices/somatic-hg38/

    and Ref file (The unpatched grch38 assembly from NCBI https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.26/)

    If I run the program without the pon and germline source files, the program works smoothly. 

     

    I wonder if the inconsistency is caused by the out-dated grch38 assembly and whether it can be solved by using the uptodate grch38 patch 13. 

     

    Many thanks.

     

    0
    Comment actions Permalink
  • Avatar
    Field -Ye Tian

    As a quick update, align with the latest grch38patch13 didn't solve the above-mentioned problem. 

     

    0
    Comment actions Permalink
  • Avatar
    maximus

    The solution stated above is completely failed.

    I generate the BAM file from a certain hg38 reference sequence using bwa.

    Then I call Mutect2 done on the generated BAM and the same hg38 reference.

    With the same source of hg38 reference, how would there be difference in naming of contigs?

    How come I can't use the software smoothly?

    0
    Comment actions Permalink
  • Avatar
    maximus

    I have tried using different reference sequences (UCSC vs Reqseq) and difference sources of different germline resources, and either I get

    A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found.

    Or

    A USER ERROR has occurred: An index is required but was not found for file /XXX/XXX/XXXX.vcf.gz. Support for unindexed block-compressed files has been temporarily disabled. Try running IndexFeatureFile on the input.

    How come such a extensively developed and maintained software will have such a bug that I can't even run a simple Mutect2 program as an initial small test?

    0
    Comment actions Permalink
  • Avatar
    maximus

    Even I only provided the input BAM files and reference genome, without providing the germline resources, the Mutect2 program can't even produce a vcf file

    0
    Comment actions Permalink
  • Avatar
    zdr j

    Got the same problem, tried different references, did not solve the issue.

    frustrated at GATK

    Can any one introduce another software to call somatic mutations (small indels and point mutations) that runs without so many bugs and errors? 

    my reads are aligned sliced sequences from ICGC (aligned to GRCH37)

    Please, any help will be appreciated  

    0
    Comment actions Permalink
  • Avatar
    Rea Kalampaliki

    Hello!
    Is there any way to test the compatability of contigs between a bam file and a reference genome fasta file before the analysis. In case, we are not sure if the bam and reference genome match?

    0
    Comment actions Permalink
  • Avatar
    Jordi D

    Dear all,

    I got some bam files from a collaborator and I would like to use the GATK workflow from the BaseRecalibrator tool.

    Its reference sequence had some control sequences like PhiX that I lacked from the reference sequence I will use later. Hence, they difer just in 2 "@SQ" lines:

    Bam file header from gathered bam files:

    @SQ    SN:GL000225.1    LN:211173
    @SQ    SN:GL000192.1    LN:547496
    @SQ    SN:NC_007605    LN:171823
    @SQ    SN:hs37d5    LN:35477943
    @SQ    SN:phiX174    LN:5386

    What actually I am interested for:

    @SQ    SN:GL000225.1    LN:211173
    @SQ    SN:GL000192.1    LN:547496
    @SQ    SN:NC_007605    LN:171823

    In order to avoid mapping again fastq sequences, is there a way to edit the bam header in order to obtain a valid bam file without the reference sequences hsd37d5 and phiX174?

    Thank you very much in advance.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk