Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

DetermineGermlineContigPloidy error with exome germline data

Answered
0

5 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi shun inoue,

    This doesn't look like a problem with the sample size or memory - this tool can definitely handle this many samples. I asked one of our gCNV experts about this error message to figure out how you can solve this problem and here is what they said:

    Basically this error indicates that the inference diverged, which could be a result of some idiosyncrasy of the input data, or very rarely a bad luck with the starting seed of the inference. First step is to make sure that the FilterIntervals tool is run prior to DetermineGermlineContigPloidy . Secondly, user can try removing extra contigs from the analysis and see if it fixes things, namely chrMchrEBV, and decoy . Finally, another thing to try is adjusting model hyperparameters like psi_s_scale ,psi_j_scale  etc. or inference parameters like learning_rate and log_emission_sampling_median_rel_error  and many others they can find in the tool argument list.

    Please let me know if you have any questions about this.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    shun inoue

    Hi Genevieve,

    Thank you for your kind response.

    I confirmed the execution of FilterIntervals prior to DeterminGermlineContigPloidy.

    Then, How can i exclude extra contig from the analysis? 

    I am assuming that it is necessary to give argument to "-XL", but i cannot find chrM, chrEBV and decoy in my interval list named exome.cohort.gc.filtered.interval_list. to edit the interval list.

    Or, should I exclude them in the previous steps such as CollectReadCounts or FilterIntervals? and how can I do that?

    Best regards.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi shun inoue,

    There is no further action to exclude those intervals if they are already not present in your interval list. I would recommend trying out the hyperparameters we recommended to see if those work.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    shun inoue

    Hi Genevieve,

    Thank you for your attention.

    Prior to adjustment of the hyperparameters, I excluded several samples according to the distribution of mean coverage, total reads, etc. Then, aforementioned error did not appeared.

    Now, I move to GermlineCNVCaller and it works well.

    Though filtering with coverage and the number of reads is fundamental step of this pipeline, I did not notice it.

    I appreciate your cooperation and kind advices.

    Best,

    Shun

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Great to hear that it is working now, shun inoue!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk