Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATKgCNV Error: "Some contigs do not have ploidy priors"

0

32 comments

  • Avatar
    Y R

    I ran it with about 200 samples I got what looks like the same error :

    18:30:48.256 INFO  GermlineCNVCaller - Shutting down engine
    [November 26, 2023 at 6:30:48 PM GMT] org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller done. Elapsed time: 0.26 minutes.
    Runtime.totalMemory()=1224736768
    java.lang.IllegalArgumentException: At least two samples must be provided in COHORT mode.
    	at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:798)
    	at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.resolveIntervals(GermlineCNVCaller.java:419)
    	at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.doWork(GermlineCNVCaller.java:329)
    	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
    	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
    	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
    	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    	at org.broadinstitute.hellbender.Main.main(Main.java:289)
    Using GATK jar /gatk/gatk-package-4.4.0.0-local.jar
    Running:

     

    Is it necessary to use a targets interval list? Perhaps, the interval lists (annotate, filter, etc) are the ones causing the issues?

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Here are the required parameters for GermlineCNVCaller step

    USAGE: GermlineCNVCaller [arguments]
    Calls copy-number variants in germline samples given their counts and the output of DetermineGermlineContigPloidy
    Version:4.4.0.0
    Required Arguments:

    --contig-ploidy-calls <File>  Input contig-ploidy calls directory (output of DetermineGermlineContigPloidy).  Required. 

    --input,-I <String>           Input paths for read-count files containing integer read counts in genomic intervals for
                                  all samples.  All intervals specified via -L/-XL must be contained; if none are specified,
                                  then intervals must be identical and in the same order for all samples.  If read-count
                                  files are given by Google Cloud Storage paths, have the extension .counts.tsv or
                                  .counts.tsv.gz, and have been indexed by IndexFeatureFile, only the specified intervals
                                  will be queried and streamed; this can reduce disk usage by avoiding the complete
                                  localization of all read-count files.  This argument must be specified at least once.
                                  Required. 

    --output,-O <File>            Output directory.  This will be created if it does not exist.  Required. 

    --output-prefix <String>      Prefix for output filenames.  Required. 

    --run-mode <RunMode>          Tool run-mode.  Required. Possible values: {COHORT, CASE}

    You don't have to provide any interval lists for this tool unless you want to perform a calling over only the filtered intervals with high mappability and proper GC content. If you don't provide any intervals tool will still work but will not be able to distinguish which intervals are filtered due to high and low counts, abnormal GC content or low segment mappability

    COHORT mode requires -I parameter to be provided multiple times to cover all the intended input count files for the command line. Failure to add all intended samples to the GATK command line properly, which the count should be at least 2 but recommended is at least >30, will result in the error message you are facing. 

    Regards. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk