Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Issue with combining g.vcf files

Answered
0

7 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Shaun Clare,

    Is there a reason you are not using GenomicsDBImport? GenomicsDBImport has a lot of performance improvements that CombineGVCFs doesn't have.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Shaun Clare

    Our pipeline was built using CombineGVCFs along time ago so I was familiar with it. Isn't the intervals option required to make the GenomicsDB datastore? I just want all SNP across the whole 'genome'

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Yes, intervals are required with GenomicsDB. Many people use it to analyze the whole genome. You can provide an interval list that is just a list of your chromosomes. Here is an article containing information on how to use intervals with WGS: https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists

    0
    Comment actions Permalink
  • Avatar
    Shaun Clare

    It seems to be working but I'm guessing this is going to take a long time. It got passed initializing the engine but gave a warning to not use more than 100 intervals, this uses 44,000 amplicons as intervals. Do you have any advice?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    We have a usage and performance guidelines article for GenomicsDB. Generally, when you have many intervals, we recommend using the option --merge-input-intervals which can really help to speed up the import.

    0
    Comment actions Permalink
  • Avatar
    Shaun Clare

    Unfortunately since its 44k separate amplicons, --merge-input-intervals wasn't able to work since it can't detect if they were abutting. I used --merge-contigs-into-num-partitions 50 instead and I got all the way to QTL mapping with results that make sense so I think it worked.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Oh good! Glad to hear it's working so far. Please let me know if you have further questions.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk