Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATKgCNV Error: "Some contigs do not have ploidy priors"

0

32 comments

  • Avatar
    Gökalp Çelik

    Hi Y R

    Contig ploidy file provided for the gCNV workflow should contain all the contigs that are used to collect readcounts from the alignment files and their names should match exactly to the contig names on the reference genome. Can you check the contig names present in your interval_list file and your contig ploidy files? If they are not matching make sure that they match and try rerunning the tool. 

    I hope this helps. 

    1
    Comment actions Permalink
  • Avatar
    Y R

    I think I see part of the issue. The chromosome names are  chrM, chr1, etc in the interval_list file and in the contig ploidy priors has 1,2,3 etc. What would I put in for the chrM in the contig ploidy priors table?

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Y R

    Since mitochondrial chromosome is not similar to autosomal or gonosomal chromosomes in terms of copy number and behavior it is not recommended to have chrM to be included in gCNV analysis. 

    It may be better for you to remove chrM counts from your readcounts. You can do this by adding

    -XL chrM

    to your command line during your analysis. 

    1
    Comment actions Permalink
  • Avatar
    Y R

    To which command?

     

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    You may add it to GermlineCNVCaller step. 

    0
    Comment actions Permalink
  • Avatar
    Y R

    I am adding it here as the following:

    # Step 7: GermlineCNVCaller

    echo " "

    echo "Starting Step 7: GermlineCNVCaller"

    echo " "

    function run_GermlineCNVCaller() {

    localannotated_intervals_list=$1

    # Loop through each shard file path in Shard_Files_To_Be_Fed_Into_Step_7_GermlineCNVCaller

    forshard_filein"${Shard_Files_To_Be_Fed_Into_Step_7_GermlineCNVCaller[@]}"; do

    forcounts_argin"${Counts_Arguments_To_Be_Fed_Into_Step_7_GermlineCNVCaller[@]}"; do

    echo"Processing: $counts_arg with annotated intervals: ${annotate_dir}/Annotate_Intervals.tsv"

    shard_name=$(basename "$shard_file")

    singularitycommand="singularity exec $singularity_image /gatk/gatk"

    ${singularitycommand} GermlineCNVCaller \

    --contig-ploidy-calls"${DetermineGermlineContigPloidy_dir}/ploidy-calls"\

    -I"$counts_arg"\

    -L"$shard_file"\

    --annotated-intervals"${annotate_dir}/Annotate_Intervals.tsv"\

    -output"${GermlineCNVCaller_dir}"\

    --output-prefixStep_7___GermlineCNVCaller___${counts_arg}_${shard_name}______\

    --interval-merging-ruleOVERLAPPING_ONLY\

    --run-modeCOHORT\

    -XLchrM\

    --verbosityDEBUG

    done

    done

    }

    Question(s):

    1) Is there any difference between `-XL` and `--exclude-intervals`?

    2) Should I add the `chr` to the contig ploidy priors? Or should it not contain `chr`?

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Names in contig ploidy priors should match what's in the actual sequence names in the reference file. If your file has chr prefix then contig ploidy priors should have chr. 

    0
    Comment actions Permalink
  • Avatar
    Y R

    Then the contig ploidy priors table would not include the `chrM` but be the following?:

    CONTIG_NAME PLOIDY_PRIOR_0 PLOIDY_PRIOR_1 PLOIDY_PRIOR_2 PLOIDY_PRIOR_3

    chr1 0.01 0.01 0.97 0.01

    chr2 0.01 0.01 0.97 0.01

    chr3 0.01 0.01 0.97 0.01

    chr4 0.01 0.01 0.97 0.01

    chr5 0.01 0.01 0.97 0.01

    chr6 0.01 0.01 0.97 0.01

    chr7 0.01 0.01 0.97 0.01

    chr8 0.01 0.01 0.97 0.01

    chr9 0.01 0.01 0.97 0.01

    chr10 0.01 0.01 0.97 0.01

    chr11 0.01 0.01 0.97 0.01

    chr12 0.01 0.01 0.97 0.01

    chr13 0.01 0.01 0.97 0.01

    chr14 0.01 0.01 0.97 0.01

    chr15 0.01 0.01 0.97 0.01

    chr16 0.01 0.01 0.97 0.01

    chr17 0.01 0.01 0.97 0.01

    chr18 0.01 0.01 0.97 0.01

    chr19 0.01 0.01 0.97 0.01

    chr20 0.01 0.01 0.97 0.01

    chr21 0.01 0.01 0.97 0.01

    chr22 0.01 0.01 0.97 0.01

    chrX 0.01 0.49 0.49 0.01

    chrY 0.50 0.50 0.00 0.00

    Question(s):

    1) This is what I believe I understand of what you mentioned is this correct?

    2) Is there any difference between `-XL` and `--exclude-intervals`?

    3) Now I get the error:

    A USER ERROR has occurred: Argument -L,-XL has a bad value: path/to/0000-scattered.interval_list], [chrM]. The intervals specified for exclusion with -XL removed all territory specified by -L.

     

    Why is this happening?

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Y R

    Looks like some of your scatter files have intervals from chrM and -XL chrM removes all those intervals. So my first assumption may not be complete solution therefore you need to remove all chrM intervals from your initial intervallist file and from your collected readcount files as well. This means you need to recollect your readcounts with an intervallist file that does not have intervals from chrM. 

    0
    Comment actions Permalink
  • Avatar
    Y R

    I removed the `chrM` from the bed file by using the following command:

     

    awk '$1 != "chrM"' input.bed > output.bed

     

    then ran the ` BedToIntervalList` to make a fresh interval list so the targets interval list would not have `chrM`.  Then I added the new paths for the interval list and the bed list to the GATKgCNV script but I get this error 

     

    A USER ERROR has occurred: Argument -L,-XL has a bad value: [path/to/Step_6___Split_Intervals/0000-scattered.interval_list], [chrM]. The intervals specified for exclusion with -XL removed all territory specified by -L.

     

    Why is this occurring?

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Y R

    Looks like there is still chrM somewhere along the files. You may need to refresh all count files based on new interval list without chrM. Therefore you won't need to use -XL chrM as a parameter anymore. 

    0
    Comment actions Permalink
  • Avatar
    Y R

    When I remove the `-XL chrM` from the script I get this error (after a fresh run):

    14:35:59.472 INFO  GermlineCNVCaller - Shutting down engine

    [September 14, 2023 at 2:35:59 PM GMT] org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller done. Elapsed time: 1.03 minutes.

    Runtime.totalMemory()=1224736768

    java.lang.IllegalArgumentException: Annotated intervals do not contain all specified intervals.

    at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:798)

    at org.broadinstitute.hellbender.tools.copynumber.arguments.CopyNumberArgumentValidationUtils.validateAnnotatedIntervalsSubset(CopyNumberArgumentValidationUtils.java:261)

    at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.resolveIntervals(GermlineCNVCaller.java:406)

    at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.doWork(GermlineCNVCaller.java:329)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)

    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)

    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)

    at org.broadinstitute.hellbender.Main.main(Main.java:289)

     I am using the UCSC hg19 fasta, and the 

    Agilent_Version_6_Regions_Modified_chr_M bed file. Why is this happening?
    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Did you update the annotation file for your new fresh intervals?

    The error message is clearly indicating an incompatibility between annotated intervals and shards. 

    0
    Comment actions Permalink
  • Avatar
    Y R

    The annotation file is created in the GATKgCNV script in the first posting. So yes it is new. I did do a clean run but I am still getting the error.

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Y R

    Annotated intervals and interval list files should be 1:1 match even when it is divided into shards. In the last topic I remember mentioning this specifically since divided intervals are not the same as the original undivided ones in the annotated file. 

    For example lets say we have an interval list file and its annotation. 

    Interval_list file

    chr1 10000 20000
    chr1 20001 30000

    Annotated File

    chr1 10000 20000 0.9
    chr1 20001 30000 0.8

    If the interval list file is divided into 2 shards like 

    shard 1 

    chr1 10000 20000

    shard 2

    chr1 20001 30000

    These shards will still be compatible with the annotation file since all intervals are 1:1 present in the annotation file. 

    If those shards are set like this

    shard 1

    chr1 10000 15000
    chr1 15001 20000

    shard 2

    chr1 20001 25000
    chr1 25001 30000

    then these intervals although they cover the same region they are not compatible with the annotation file. Sharding must be done properly in order to be completely compatible with the original annotation file. Splitting intervals into smaller intervals must be done before the annotation and sharding must be done only by spliting the whole interval list file based on number of complete regions if needed. 

    Since we don't have access to your resources we cannot definitively say the reason but we can only act based on the error message thrown. 

    You may need to run your workflow step by step and at the end of each step you need to check your inputs and outputs. 

    We hope this helps. 

    0
    Comment actions Permalink
  • Avatar
    Y R

    Hello Gökalp Çelik I tracked back to what you mentioned about removing chrM from the interval list doing the following:

    grep -Ev '^\s*@SQ\s+SN:chrM\s*' input.interval_list > output.interval_list

    But when I do this and then run the GATKgCNV I get the error:

    A USER ERROR has occurred: Argument -L,-XL has a bad value: path/to/0000-scattered.interval_list], [chrM]. The intervals specified for exclusion with -XL removed all territory specified by -L.

     

    I did a fresh re-run with making a `.idx`, `.interval_list` and then removed chrM. Why doesn't this solve the issue?

     

     

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Y R

    It is hard to tell why but looks like there are issues the way you are processing your interval_list files. Again since we don't have access to your resources we cannot definitively tell why this is happening. Preprocessing of interval_lists must be done way before you collect your readcounts. 

    I am posting a sample interval_list preprocessed for the agilent v6 kit that I regularly use for my samples. I do not modify anything out of this file and use this sole file to collect all my readcounts for this kit and perform CNV analyses. 

    Once again let me remind you the steps and you may be able to debug your code since we could not get definitive command line parameters but only script files that are hard to evaluate just by looking at it. 

    Step 1: 

    Preprocess your interval_list. Remove N only regions, add padding to extend regions and divide larger regions into smaller ones if needed. Avoid anything not autosomal or gonosomal such as chrM.

    Step 1.1:

    Create a contig ploidy file for all the contigs you are interested in. Do not include chrM in the list since it is not really the focus of GCNV workflow. 

    Step 2: 

    Annotate your preprocessed interval_list file. This is the only step where you will annotate your file and you cannot change it later on. 

    Step 3: 

    Collect Readcounts from all your files using the interval_list file created in the step 1: 

    Step 4: 

    Run FilterIntervals using all the readcount files at once and your interval_list and annotation file to generate a filtered interval_list. 

    Step 5

    Run DetermineContigPloidy using the filtered interval_list from step 4, contig ploidy file from step 1.1 and all readcounts files together in a single step. 

    Step 5.5 (optional)

    Scatter your filtered interval_list if you need to. When scattering avoid splitting single intervals into multiples. Only scatter based on number of segments in each scatter. 

    Step 6. 

    Run GermlineCNVCaller using filtered interval_list from step 4 (or if you scatter your filtered interval_list use each scattered interval list seperately from step 5.5), annotation file from step 2, config ploidy calls from step 5, and all readcount files all together in a single step. If you use scattered interval_lists from step 5.5 each scattered file must be used together with annotation file from step 2, contig ploidy calls from step 5 annotation file from step 2 and all readcount files together in scattered interval_lists number of steps seperately with their results in seperate folders. These seperate folders will be called GCNV shards. 

    Step 7. 

    Run PostProcessGermlineCNVCalls step using GCNV calls folders, GCNV model folders from step 6, contig ploidy calls from step 5 and a single sample index which is generated when your readcounts were put into calls in step 5 and 6. Make sure that your samples are all in the same order when they are consumed by step 5 and 6. Sample index is a single integer number and you may be able to see them in the logs. This step must be run for each sample seperately. If you have used scattered interval_lists from step 5.5 in step 6 then you need to add all GCNV model and call folders (shards) into the command line. 

    Here is the link to the repository of my sample interval_list files for agilent v6 kit that I use for my agilent v6 captured samples. 

    https://github.com/gokalpcelik/intervallists 

     

    I hope this helps. 

     

    0
    Comment actions Permalink
  • Avatar
    Y R

    I am not sure I full understand but I wanted to add the following. I removed the chrM from the fasta and created a clean intervals list without chrM and then this created an error when

    A USER ERROR has occurred: Illegal argument value: Positional arguments were provided ',ploidy}' but no positional argument is defined for this tool.` from DetermineGermlineContigPloidy. 

    I am sure after this I will have issues with interval list but I am not sure what to do with this error. My contig ploidy priors table is as follows:

     

    CONTIG_NAME PLOIDY_PRIOR_0 PLOIDY_PRIOR_1 PLOIDY_PRIOR_2 PLOIDY_PRIOR_3

    chr1 0.01 0.01 0.97 0.01

    chr2 0.01 0.01 0.97 0.01

    chr3 0.01 0.01 0.97 0.01

    chr4 0.01 0.01 0.97 0.01

    chr5 0.01 0.01 0.97 0.01

    chr6 0.01 0.01 0.97 0.01

    chr7 0.01 0.01 0.97 0.01

    chr8 0.01 0.01 0.97 0.01

    chr9 0.01 0.01 0.97 0.01

    chr10 0.01 0.01 0.97 0.01

    chr11 0.01 0.01 0.97 0.01

    chr12 0.01 0.01 0.97 0.01

    chr13 0.01 0.01 0.97 0.01

    chr14 0.01 0.01 0.97 0.01

    chr15 0.01 0.01 0.97 0.01

    chr16 0.01 0.01 0.97 0.01

    chr17 0.01 0.01 0.97 0.01

    chr18 0.01 0.01 0.97 0.01

    chr19 0.01 0.01 0.97 0.01

    chr20 0.01 0.01 0.97 0.01

    chr21 0.01 0.01 0.97 0.01

    chr22 0.01 0.01 0.97 0.01

    chrX 0.01 0.49 0.49 0.01

    chrY 0.50 0.50 0.00 0.00
     
    Why would this occur? The relevant section of the code is placed below:
     
    # Step 5: DetermineGermlineContigPloidy:

    echo " "

    echo "Starting Step 5: DetermineGermlineContigPloidy"

    echo " "

    singularity exec $singularity_image_path /gatk/gatk DetermineGermlineContigPloidy \

    --intervals"${filtered_intervals_dir}/Filter_Intervals.interval_list"\

    --interval-merging-ruleOVERLAPPING_ONLY\

    ${counts_arguments} \

    --contig-ploidy-priors ${contig_ploidy_priors} \

    --output-prefixploidy\

    --output ${DetermineGermlineContigPloidy_dir}
     
     

    I would give the example SplitInterval files if it were not for this interesting error.

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    This is a script issue not GATK actually. Your script is passing a malformed input to a parameter. 

    By the way singularity is not supported as en executor for GATK containers therefore if you are facing any issues related to singularity we cannot solve them. 

    0
    Comment actions Permalink
  • Avatar
    Y R

    Thank you for the response. It isn't an issue with singularity so it should be fine. The only reason I use singularity is because that currently is the only way I can use GATK on the HPC I am on. The issues I have are with the inputs I use/script but not singularity.

    0
    Comment actions Permalink
  • 0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Y R

    Sorry for the misunderstanding on my end. I did not mean that your problems are related to singularity but if you also face any issues about singularity then we cannot provide support for it because we only support docker for container runtime. 

    For the reference genome part you need to check the sequence dictionaries to match in order to use a particular reference file for your workflows. Certain reference genomes could be generated for specific purposes however our main recommendations are documented in several pages 

    https://gatk.broadinstitute.org/hc/en-us/articles/360035890711-GRCh37-hg19-b37-humanG1Kv37-Human-Reference-Discrepancies 

    https://gatk.broadinstitute.org/hc/en-us/articles/360035890951-Human-genome-reference-builds-GRCh38-or-hg38-b37-hg19 

    Of course the choice is yours but beware that if you use resource files provided by us then keep in mind that our resource files are based on our recommended reference genomes for GRCh37 an GRCh38. You may confirm this via checking the header sections of vcf and interval_list files provided in the resource bundle. 

    0
    Comment actions Permalink
  • Avatar
    Y R

    Hello Gökalp Celik,

     

    I used the references recommended by GATK for hg19 and then I got a peculiar error dealing with the contig ploidy priors table:

     

    The following mandatory columns could not be found in "path/to/chr_contig_ploidy_priors.tsv"; cannot continue: {'CONTIG_NAME'}




    at org.broadinstitute.hellbender.utils.python.PythonExecutorBase.getScriptException(PythonExecutorBase.java:75)

    at org.broadinstitute.hellbender.utils.runtime.ScriptExecutor.executeCuratedArgs(ScriptExecutor.java:112)

    at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeArgs(PythonScriptExecutor.java:193)

    at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:168)

    at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:139)

    at org.broadinstitute.hellbender.tools.copynumber.DetermineGermlineContigPloidy.executeDeterminePloidyAndDepthPythonScript(DetermineGermlineContigPloidy.java:427)

    at org.broadinstitute.hellbender.tools.copynumber.DetermineGermlineContigPloidy.doWork(DetermineGermlineContigPloidy.java:324)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)

    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)

    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)

    at org.broadinstitute.hellbender.Main.main(Main.java:289)

    Using GATK jar /gatk/gatk-package-4.4.0.0-local.jar
     
    Why is this occurring. My contig ploidy priors table pasted below:
     
    CONTIG_NAME PLOIDY_PRIOR_0 PLOIDY_PRIOR_1 PLOIDY_PRIOR_2 PLOIDY_PRIOR_3

    chr1 0.01 0.01 0.97 0.01

    chr2 0.01 0.01 0.97 0.01

    chr3 0.01 0.01 0.97 0.01

    chr4 0.01 0.01 0.97 0.01

    chr5 0.01 0.01 0.97 0.01

    chr6 0.01 0.01 0.97 0.01

    chr7 0.01 0.01 0.97 0.01

    chr8 0.01 0.01 0.97 0.01

    chr9 0.01 0.01 0.97 0.01

    chr10 0.01 0.01 0.97 0.01

    chr11 0.01 0.01 0.97 0.01

    chr12 0.01 0.01 0.97 0.01

    chr13 0.01 0.01 0.97 0.01

    chr14 0.01 0.01 0.97 0.01

    chr15 0.01 0.01 0.97 0.01

    chr16 0.01 0.01 0.97 0.01

    chr17 0.01 0.01 0.97 0.01

    chr18 0.01 0.01 0.97 0.01

    chr19 0.01 0.01 0.97 0.01

    chr20 0.01 0.01 0.97 0.01

    chr21 0.01 0.01 0.97 0.01

    chr22 0.01 0.01 0.97 0.01

    chrX 0.01 0.49 0.49 0.01

    chrY 0.50 0.50 0.00 0.00
    Is there something wrong with the contig ploidy priors table?
    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Contig ploidy file is supposed to be a tab-separated file. Some text editors prefer adding spaces instead of tab so you need to check if your whitespaces are actually tab or space. 

     

    0
    Comment actions Permalink
  • Avatar
    Y R

    Regarding the previous suggestion:

    Thank you for your suggestion.  Your suggestion worked for the error.

     

    Basic Information And Statement Of Problem:

    I am currently test driving the script on 5 test bam files. Now I get the following error at the Germline CNV Caller step:

     

    java.lang.IllegalArgumentException: Annotated intervals do not contain all specified intervals.

    at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:798)

    at org.broadinstitute.hellbender.tools.copynumber.arguments.CopyNumberArgumentValidationUtils.validateAnnotatedIntervalsSubset(CopyNumberArgumentValidationUtils.java:261)

    at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.resolveIntervals(GermlineCNVCaller.java:406)

    at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.doWork(GermlineCNVCaller.java:329)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)

    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)

    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)

    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)

    at org.broadinstitute.hellbender.Main.main(Main.java:289)

    Using GATK jar /gatk/gatk-package-4.4.0.0-local.jar
     
    This error seems to be related to the intervals related issues I was having before so I will attach the head for 2 to 3 files created by the intervals steps in the hopes that the header will reveal something about the interval issues.
     
    Step 1: Preprocessing Interval List (Only one file generated in this step)
     
    @HD   VN:1.6

    @SQ    SN:chr1           LN:249250621 M5:1b22b98cdeb4a9304cb5d48026a85128UR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr2           LN:243199373 M5:a0d9851da00400dec1098a9255ac712eUR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr3           LN:198022430 M5:641e4338fa8d52a5b781bd2a2c08d3c3  UR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr4           LN:191154276 M5:23dccd106897542ad87d2765d28a19a1UR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr5           LN:180915260 M5:0740173db9ffd264d728f32784845cd7   UR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr6           LN:171115067 M5:1d3a93a248d92a729ee764823acbbc6bUR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr7           LN:159138663 M5:618366e953d6aaad97dbe4777c29375eUR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chrX           LN:155270560 M5:7e0e2e580297b7764e31dbc80c2540ddUR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr8           LN:146364022 M5:96f514a9929e410c6651697bded59aec  UR:file:/path/to/GATK/reference/hg19.fa

     

    Step 2: Annotate Intervals (Only one file generated in this step):

     

    @HD   VN:1.6

    @SQ    SN:chr1           LN:249250621 M5:1b22b98cdeb4a9304cb5d48026a85128UR:file:/path/to/gatk/reference/hg19.fa

    @SQ    SN:chr2           LN:243199373 M5:a0d9851da00400dec1098a9255ac712eUR:file:/path/to/gatk/reference/hg19.fa

    @SQ    SN:chr3           LN:198022430 M5:641e4338fa8d52a5b781bd2a2c08d3c3  UR:file:/path/to/gatk/reference/hg19.fa

    @SQ    SN:chr4           LN:191154276 M5:23dccd106897542ad87d2765d28a19a1UR:file:/path/to/gatk/reference/hg19.fa

    @SQ    SN:chr5           LN:180915260 M5:0740173db9ffd264d728f32784845cd7   UR:file:/path/to/gatk/reference/hg19.fa

    @SQ    SN:chr6           LN:171115067 M5:1d3a93a248d92a729ee764823acbbc6bUR:file:/path/to/gatk/reference/hg19.fa

    @SQ    SN:chr7           LN:159138663 M5:618366e953d6aaad97dbe4777c29375eUR:file:/path/to/gatk/reference/hg19.fa

    @SQ    SN:chrX           LN:155270560 M5:7e0e2e580297b7764e31dbc80c2540ddUR:file:/path/to/gatk/reference/hg19.fa

    @SQ    SN:chr8           LN:146364022 M5:96f514a9929e410c6651697bded59aec  UR:file:/path/to/gatk/reference/hg19.fa

     

    Step 3: Collect Read Counts

    This was not included as these counts are per bam and there are as many `.tsv` outputs as there are input bams.

     

    Step 4: Filter Interval (Only one `.interval_list`)

    @HD   VN:1.6

    @SQ    SN:chr1           LN:249250621 M5:1b22b98cdeb4a9304cb5d48026a85128UR:file:/path/to/gatk/reference/hg19.fa

    @SQ    SN:chr2           LN:243199373 M5:a0d9851da00400dec1098a9255ac712eUR:file:/path/to/gatk/reference/hg19.fa

    @SQ    SN:chr3           LN:198022430 M5:641e4338fa8d52a5b781bd2a2c08d3c3  UR:file:/path/to/gatk/reference/hg19.fa

    @SQ    SN:chr4           LN:191154276 M5:23dccd106897542ad87d2765d28a19a1UR:file:/path/to/gatk/reference/hg19.fa

    @SQ    SN:chr5           LN:180915260 M5:0740173db9ffd264d728f32784845cd7   UR:file:/path/to/gatk/reference/hg19.fa

    @SQ    SN:chr6           LN:171115067 M5:1d3a93a248d92a729ee764823acbbc6bUR:file:/path/to/gatk/reference/hg19.fa

    @SQ    SN:chr7           LN:159138663 M5:618366e953d6aaad97dbe4777c29375eUR:file:/path/to/gatk/reference/hg19.fa

    @SQ    SN:chrX           LN:155270560 M5:7e0e2e580297b7764e31dbc80c2540ddUR:file:/path/to/gatk/reference/hg19.fa

    @SQ    SN:chr8           LN:146364022 M5:96f514a9929e410c6651697bded59aec  UR:file:/path/to/gatk/reference/hg19.fa

     

    Step 5: Determine Germline Contig Policy (Many Outputs)

     

    As there are many outputs in this step I am not sure adding the header of any of them will be useful.

     

    Step 6: Scattered Interval Lists (92 total interval lists)

     

    For `0000-scattered.interval_list`:

     

    @HD   VN:1.6 SO:coordinate

    @SQ    SN:chr1           LN:249250621 M5:1b22b98cdeb4a9304cb5d48026a85128UR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr2           LN:243199373 M5:a0d9851da00400dec1098a9255ac712eUR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr3           LN:198022430 M5:641e4338fa8d52a5b781bd2a2c08d3c3  UR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr4           LN:191154276 M5:23dccd106897542ad87d2765d28a19a1UR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr5           LN:180915260 M5:0740173db9ffd264d728f32784845cd7   UR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr6           LN:171115067 M5:1d3a93a248d92a729ee764823acbbc6bUR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr7           LN:159138663 M5:618366e953d6aaad97dbe4777c29375eUR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chrX           LN:155270560 M5:7e0e2e580297b7764e31dbc80c2540ddUR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr8           LN:146364022 M5:96f514a9929e410c6651697bded59aec  UR:file:/path/to/GATK/reference/hg19.fa

     

    For `0001-scattered.interval_list`:

     

    @HD   VN:1.6 SO:coordinate

    @SQ    SN:chr1           LN:249250621 M5:1b22b98cdeb4a9304cb5d48026a85128UR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr2           LN:243199373 M5:a0d9851da00400dec1098a9255ac712eUR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr3           LN:198022430 M5:641e4338fa8d52a5b781bd2a2c08d3c3  UR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr4           LN:191154276 M5:23dccd106897542ad87d2765d28a19a1UR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr5           LN:180915260 M5:0740173db9ffd264d728f32784845cd7   UR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr6           LN:171115067 M5:1d3a93a248d92a729ee764823acbbc6bUR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr7           LN:159138663 M5:618366e953d6aaad97dbe4777c29375eUR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chrX           LN:155270560 M5:7e0e2e580297b7764e31dbc80c2540ddUR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr8           LN:146364022 M5:96f514a9929e410c6651697bded59aec  UR:file:/path/to/GATK/reference/hg19.fa

     

    For `0002-scattered.interval_list`:

     

    @SQ    SN:chr1           LN:249250621 M5:1b22b98cdeb4a9304cb5d48026a85128UR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr2           LN:243199373 M5:a0d9851da00400dec1098a9255ac712eUR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr3           LN:198022430 M5:641e4338fa8d52a5b781bd2a2c08d3c3  UR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr4           LN:191154276 M5:23dccd106897542ad87d2765d28a19a1UR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr5           LN:180915260 M5:0740173db9ffd264d728f32784845cd7   UR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr6           LN:171115067 M5:1d3a93a248d92a729ee764823acbbc6bUR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr7           LN:159138663 M5:618366e953d6aaad97dbe4777c29375eUR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chrX           LN:155270560 M5:7e0e2e580297b7764e31dbc80c2540ddUR:file:/path/to/GATK/reference/hg19.fa

    @SQ    SN:chr8           LN:146364022 M5:96f514a9929e410c6651697bded59aec  UR:file:/path/to/GATK/reference/hg19.fa

     

    There is no output I can see for Step 7 Germline CNV caller due to this error so none was posted in this response.

     

    Note(s):

    The paths are there in the real files but due to the HPC rules I have to follow I have removed them in this post.

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Headers are just sequence dictionaries that those are not really helpful for your case. You need to check if all scattered intervals (regions to be specific) is exactly present (not as a subpart of an interval but actually as it is) in the unscattered interval list in the input file for the Scattering tool that you are using. 

    Lets say if you have an interval like this in the filtered intervallist 

    chr1 100000 120000

    but have 

    chr1 100000 110000
    chr1 110001 120000

    in your scattered file then you will get this error because none of your read collections actually have the bottom regions as a readcount value.

    0
    Comment actions Permalink
  • Avatar
    Y R

    I am not sure I fully understand but I can do a `wc -l` for each interval and check the `tail` ends here:

     

    Step 1: Preprocess Intervals

    Word Count:

    `wc -l` produces 781866

    The tail:

    chr21 48081660 48081859 + .

    chr21 48081860 48082059 + .

    chr21 48082060 48082100 + .

    chr21 48083043 48083242 + .

    chr21 48083243 48083442 + .

    chr21 48083443 48083642 + .

    chr21 48083643 48083718 + .

    chr21 48083955 48084154 + .

    chr21 48084155 48084354 + .

    chr21 48084355 48084536 + .
     
    Step 2: Annotate Intervals
     
    Word Count:
     
    `wc -l` produces 781867
     
    The tail:
     
    chr21 48081660 48081859 0.690000 0.700000

    chr21 48081860 48082059 0.800000 0.000000

    chr21 48082060 48082100 0.707317 0.000000

    chr21 48083043 48083242 0.585000 0.000000

    chr21 48083243 48083442 0.580000 0.750000

    chr21 48083443 48083642 0.515000 0.125000

    chr21 48083643 48083718 0.578947 0.000000

    chr21 48083955 48084154 0.290000 0.000000

    chr21 48084155 48084354 0.415000 0.405000

    chr21 48084355 48084536 0.648352 0.000000
     
    Step 3: Collect Read Counts
     
    This has as many counts files as there are bam input files. This does not seem relevant to the current question so the `wc -l` and the tails are not added here.
     
    Step 4: Filter Intervals
     
    Word Count:
     
    wc -l produces 43112.
     
    The tail:
     
    chr21 47423385 47423584 + .

    chr21 47423585 47423784 + .

    chr21 47531900 47532099 + .

    chr21 47532100 47532299 + .

    chr21 47532300 47532499 + .

    chr21 47545816 47546015 + .

    chr21 47549176 47549375 + .

    chr21 47552016 47552215 + .

    chr21 47552216 47552415 + .

    chr21 47556770 47556969 + .
     
    Step 5: Determine Germline Contig Ploidy
     
    This has multiple outputs and the `wc -l` and the tail does not seem to be useful here.
     
    Step 6: Scattered Interval List/Split Intervals
     
    Word Count (Modified):
     
    for file in *; do echo "$(wc -l < "$file") lines in $file"; done 


    # Gives 95 lines for each list from 0000 to 0092 so 93 lists

    # 95 lines * 93 lists = 8835 lines total
     
    Tail Ends:
     

    Tail of 0000:

     

    @SQ    SN:chr9_gl000201_random    LN:36148        M5:dfb7e7ec60ffdcb85cb359ea28454ee9    UR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chrUn_gl000235    LN:34474        M5:118a25ca210cfbcdfb6c2ebb249f9680    UR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chrUn_gl000239    LN:33824        M5:99795f15702caec4fa1c4e15f8a29c07     UR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chr21_gl000210_random  LN:27682        M5:851106a74238044126131ce2a8e5847cUR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chrUn_gl000231    LN:27386        M5:ba8882ce3a1efa2080e5d29b956568a4  UR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chrUn_gl000229    LN:19913        M5:d0f40ec87de311d8e715b52e4c7062e1  UR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chrM          LN:16571        M5:d2ed829b8a1628d16cbeee88e88e39ebUR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chrUn_gl000226    LN:15008        M5:1c1b2cd1fccbc0a99b6a447fa24d1504    UR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chr18_gl000207_random  LN:4262           M5:f3814841f1939d3ca19072d9e89f3fd7    UR:file:/path/to/GATK/referencehg19.fa

    chr1     1          249250621      +          .

     

    Tail of 0001:

     

    @SQ    SN:chr9_gl000201_random    LN:36148        M5:dfb7e7ec60ffdcb85cb359ea28454ee9    UR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chrUn_gl000235    LN:34474        M5:118a25ca210cfbcdfb6c2ebb249f9680    UR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chrUn_gl000239    LN:33824        M5:99795f15702caec4fa1c4e15f8a29c07     UR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chr21_gl000210_random  LN:27682        M5:851106a74238044126131ce2a8e5847cUR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chrUn_gl000231    LN:27386        M5:ba8882ce3a1efa2080e5d29b956568a4  UR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chrUn_gl000229    LN:19913        M5:d0f40ec87de311d8e715b52e4c7062e1  UR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chrM          LN:16571        M5:d2ed829b8a1628d16cbeee88e88e39ebUR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chrUn_gl000226    LN:15008        M5:1c1b2cd1fccbc0a99b6a447fa24d1504    UR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chr18_gl000207_random  LN:4262           M5:f3814841f1939d3ca19072d9e89f3fd7    UR:file:/path/to/GATK/referencehg19.fa

    chr2     1          243199373      +          .

     

    Tail of 0092:

     

    @SQ    SN:chr9_gl000201_random    LN:36148        M5:dfb7e7ec60ffdcb85cb359ea28454ee9    UR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chrUn_gl000235    LN:34474        M5:118a25ca210cfbcdfb6c2ebb249f9680    UR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chrUn_gl000239    LN:33824        M5:99795f15702caec4fa1c4e15f8a29c07     UR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chr21_gl000210_random  LN:27682        M5:851106a74238044126131ce2a8e5847cUR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chrUn_gl000231    LN:27386        M5:ba8882ce3a1efa2080e5d29b956568a4  UR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chrUn_gl000229    LN:19913        M5:d0f40ec87de311d8e715b52e4c7062e1  UR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chrM          LN:16571        M5:d2ed829b8a1628d16cbeee88e88e39ebUR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chrUn_gl000226    LN:15008        M5:1c1b2cd1fccbc0a99b6a447fa24d1504    UR:file:/path/to/GATK/referencehg19.fa

    @SQ    SN:chr18_gl000207_random  LN:4262           M5:f3814841f1939d3ca19072d9e89f3fd7    UR:file:/path/to/GATK/referencehg19.fa

    chr18_gl000207_random       1          4262    +          .

     

    Step 7: Germline CNV Caller

     

    Has no outputs due to the error recieved.

     

    Note(s):

     

    The paths have been removed due to the rules I need to follow as I am running this on an HPC. There are real paths in the original files.

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Looking at your scatter files it is clear that they are not prepared properly and the error is really due to your incompatible scatter files and filtered interval list. 

    The only line that is present after the header is the whole chromosome as a single interval 

    chr1     1          249250621      +          .

     

    Once you filter your intervals due to GC content, mappability and segmental duplication using FilterIntervalList you need to use that filtered intervallist as an input to generate your scattered interval lists of all of which cannot contain any intervals that is not in the actual filtered interval list. 

    Regards. 

     

    0
    Comment actions Permalink
  • Avatar
    Y R

    Incorporation Of Suggestion:

     

    It worked. Thank you. 

     

    Current Error:

     

    5:24:14.722 INFO  GermlineCNVCaller - Shutting down engine
    [November 15, 2023 at 3:24:14 PM GMT] org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller done. Elapsed time: 0.25 minutes.
    Runtime.totalMemory()=1224736768
    java.lang.IllegalArgumentException: At least two samples must be provided in COHORT mode.
            at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:798)
            at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.resolveIntervals(GermlineCNVCaller.java:419)
            at org.broadinstitute.hellbender.tools.copynumber.GermlineCNVCaller.doWork(GermlineCNVCaller.java:329)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
            at org.broadinstitute.hellbender.Main.main(Main.java:289)

     

    Relevant Section Of Script:

     

    # Concatenate all the split interval files with -L flag:

    shard_arguments=$(printf -- "-L %s " "${SplitIntervals_dir}"/*.interval_list)

    #_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    # Create variable `sample_counter` to identify/index number of samples in the call.

    samplecount=${num_bams} # Get the count of samples - 1 bam == 1 sample

    samplecount=$((samplecount - 1)) # Decrement samplecount by 1

    echo " "

    echo "The sample count used for the final step is one less than the number of bams."

    echo "The number of bams: $num_bams"

    echo "The sample count is: $samplecount"

    echo " "

    counter=0

    #_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    # Concatenate calls shard path .tsv files to pass in as --calls-shard-path, List of paths to GermlineCNVCaller model directories.:

    calls_shard_list=$(printf -- "--calls-shard-path %s " "${GermlineCNVCaller_dir}/germline_cnv_caller_cohort-calls"/*)

    echo "${calls_shard_list}"

    #_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    # Concatenate models shard path .tsv files to pass in as --model-shard-path, List of paths to GermlineCNVCaller model directories.:

    model_shard_list=$(printf -- "--model-shard-path %s " "${GermlineCNVCaller_dir}/germline_cnv_caller_cohort-model"/*)

    echo "${model_shard_list}"

    #_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    # Concatenate all the counts files in an array

    Counts_Arguments_To_Be_Fed_Into_Step_7_GermlineCNVCaller=("${counts_dir}"/*.tsv)

    # Concatenate all the shard files in an array

    Shard_Files_To_Be_Fed_Into_Step_7_GermlineCNVCaller=("${SplitIntervals_dir}"/*.interval_list)

    #_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    # Step 7: GermlineCNVCaller

    echo " "

    echo "Starting Step 7: GermlineCNVCaller"

    echo " "

    function run_GermlineCNVCaller() {

    localannotated_intervals_list=$1

    # Loop through each shard file path in Shard_Files_To_Be_Fed_Into_Step_7_GermlineCNVCaller

    forshard_filein"${Shard_Files_To_Be_Fed_Into_Step_7_GermlineCNVCaller[@]}"; do

    forcounts_argin"${Counts_Arguments_To_Be_Fed_Into_Step_7_GermlineCNVCaller[@]}"; do

    echo"Processing: $counts_arg with annotated intervals: ${annotate_dir}/Annotate_Intervals.tsv"

    shard_name=$(basename "$shard_file")

    singularitycommand="singularity exec $singularity_image /gatk/gatk"

    ${singularitycommand} GermlineCNVCaller \

    --contig-ploidy-calls"${DetermineGermlineContigPloidy_dir}/ploidy-calls"\

    -I"$counts_arg"\

    -L"$shard_file"\

    --annotated-intervals"${annotate_dir}/Annotate_Intervals.tsv"\

    -output"${GermlineCNVCaller_dir}"\

    --output-prefixStep_7___GermlineCNVCaller___${counts_arg}_${shard_name}______\

    --interval-merging-ruleOVERLAPPING_ONLY\

    --run-modeCOHORT\

    --verbosityDEBUG

    done

    done

    }

    run_GermlineCNVCaller

    # Wait for all background processes to finish

    wait
     
     
    Why is this occurring when there were 5 bams/samples inputted in the beginning of the pipeline?
    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Y R

    Can you check your log files to see the actual number of samples fed into GermlineCNVCaller step?

    Looks like your script is not properly adding all samples at once to the corresponding step.

     Also 5 samples for a cohort is too low. We would suggest you to have at least 30 samples or more for a cohort study.

    Regards. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk