Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

gatk genome loc coordinates exceed the contig size. While trying to run genomicsdbimport

0

4 comments

  • Avatar
    Gökalp Çelik

    Hi Paulo Ricardo

    You need to use what ever genome fasta file used to capture variants from reads to import into GenomicsDB. Any incompatibilities will result in these errors. 

    You can check contig sizes from the sequence dictionary file. If your bed file coordinates exceed those lengths then you will receive this error. 

    Make sure that your VCF sequence dictionary is compatible with your fasta dictionary otherwise you will not get any results. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Paulo Ricardo

    Hello, thank you for the fast response.

    But the interval list should be the bed file provided by the manufacturer of the sequencing right? I used the same reference to create the vcf file of the normal sample, and i'm using the same in the parameter -R for genomics DB import. How can I do that with this bed file?

    0
    Comment actions Permalink
  • Avatar
    Paulo Ricardo

    My doubt is in the parameter -L, how do I provide the interval list? In one of the foruns I saw that I could use the covered.bed file provided by the manufacturer. So I use it, after putting it in the correct format of "chr" "start" "end"

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Paulo Ricardo

    You don't have to use the bed file from any manufacturer for GenomicsDB import, although you can use it but it is not necessary. You can use whole contig names as values for -L therefore you may import in parallel into multiple GenomicsDB instances. 

    If the bed file from the manufacturer is not compatible with the reference genome such as having intervals exceeding the size of the contig, you may get such messages. You need to check with the manufacturer's fact sheets to ensure these compatibilities. 

    Regards. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk