Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Genomestrip cannot find --ploidyMapFile

0

9 comments

  • Avatar
    Swanthana Rekulapally

    Update...I gave it a ploidyMap file and I guess the SVProcess almost ran till end, but it did gave errors for FunctionEdge

    ERROR 16:20:14,814 FunctionEdge - Error:  'java'  '-Xmx2048m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/pine/scr/s/w/swan/svtoolkit/test/tmpdir'  '-cp' '/nas/longleaf/home/swan/svtoolkit/lib/SVToolkit.jar:/nas/longleaf/home/swan/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/nas/longleaf/home/swan/svtoolkit/lib/gatk/Queue.jar'  '-cp' '/nas/longleaf/home/swan/svtoolkit/lib/SVToolkit.jar:/nas/longleaf/home/swan/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/nas/longleaf/home/swan/svtoolkit/lib/gatk/Queue.jar'  'org.broadinstitute.sv.apps.ComputeDepthProfiles'  '-I' 'preprocessing/metadata/headers.bam'  '-configFile' '/nas/longleaf/home/swan/svtoolkit/conf/genstrip_parameters.txt'  '-R' '/proj/ncgenes2/src/ncgenes2-exome-pipeline/modules/apps/human-genome-for-alignment/1405.15/GRCh38_no_alt_analysis_set.refseqids.fna'  '-L' 'NT_187449.1:0-0'  '-md' 'preprocessing/metadata'  '-profileBinSize' '100000'  '-maximumReferenceGapLength' '10000'  '-O' '/pine/scr/s/w/swan/svtoolkit/test/preprocessing/metadata/profiles_100Kb/profile_seq_NT_187449.1_100000.dat.gz' 

    ERROR 16:20:14,815 FunctionEdge - Contents of /pine/scr/s/w/swan/svtoolkit/test/preprocessing/logs/SVPreprocess-261.out:

    And also, this is end of the file. Where it says the script failed 

    DEBUG 16:20:52,227 RScriptExecutor - Result: 0

    INFO  16:20:52,234 QCommandLine - Done with errors

    INFO  16:20:52,243 QGraph - -------

    INFO  16:20:52,244 QGraph - Failed:   'java'  '-Xmx2048m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/pine/scr/s/w/swan/svtoolkit/test/tmpdir'  '-cp' '/nas/longleaf/home/swan/svtoolkit/lib/SVToolkit.jar:/nas/longleaf/home/swan/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/nas/longleaf/home/swan/svtoolkit/lib/gatk/Queue.jar'  '-cp' '/nas/longleaf/home/swan/svtoolkit/lib/SVToolkit.jar:/nas/longleaf/home/swan/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/nas/longleaf/home/swan/svtoolkit/lib/gatk/Queue.jar'  'org.broadinstitute.sv.apps.ComputeDepthProfiles'  '-I' 'preprocessing/metadata/headers.bam'  '-configFile' '/nas/longleaf/home/swan/svtoolkit/conf/genstrip_parameters.txt'  '-R' '/proj/ncgenes2/src/ncgenes2-exome-pipeline/modules/apps/human-genome-for-alignment/1405.15/GRCh38_no_alt_analysis_set.refseqids.fna'  '-L' 'NT_187449.1:0-0'  '-md' 'preprocessing/metadata'  '-profileBinSize' '100000'  '-maximumReferenceGapLength' '10000'  '-O' '/pine/scr/s/w/swan/svtoolkit/test/preprocessing/metadata/profiles_100Kb/profile_seq_NT_187449.1_100000.dat.gz' 

    DEBUG 16:20:52,244 QGraph - Inputs:  List(/pine/scr/s/w/swan/svtoolkit/test/preprocessing/metadata/chimerism.dat, /pine/scr/s/w/swan/svtoolkit/test/preprocessing/metadata/depth.dat, /pine/scr/s/w/swan/svtoolkit/test/preprocessing/metadata/gcprofiles.zip, /pine/scr/s/w/swan/svtoolkit/test/preprocessing/metadata/headers.bam, /pine/scr/s/w/swan/svtoolkit/test/preprocessing/metadata/headers.bam.bai, /pine/scr/s/w/swan/svtoolkit/test/preprocessing/metadata/rccache.bin.idx, /pine/scr/s/w/swan/svtoolkit/test/preprocessing/metadata/spans.dat)

    DEBUG 16:20:52,244 QGraph - Outputs: List(/pine/scr/s/w/swan/svtoolkit/test/preprocessing/logs/SVPreprocess-261.out, /pine/scr/s/w/swan/svtoolkit/test/preprocessing/metadata/profiles_100Kb/profile_seq_NT_187449.1_100000.dat.gz)

    DEBUG 16:20:52,244 QGraph - Done+:   List()

    DEBUG 16:20:52,244 QGraph - Done-:   List(/pine/scr/s/w/swan/svtoolkit/test/preprocessing/metadata/profiles_100Kb/.profile_seq_NT_187449.1_100000.dat.gz.done)

    DEBUG 16:20:52,244 QGraph - CmdDir:  /pine/scr/s/w/swan/svtoolkit/test

    DEBUG 16:20:52,244 QGraph - Temp?:   false

    DEBUG 16:20:52,244 QGraph - Prev:    Pending (reset = true)

    INFO  16:20:52,244 QGraph - Log:     /pine/scr/s/w/swan/svtoolkit/test/preprocessing/logs/SVPreprocess-261.out

    INFO  16:20:52,245 QCommandLine - Script failed: 3 Pend, 0 Run, 1 Fail, 410 Done

    ------------------------------------------------------------------------------------------

    Done. There were no warn messages.

    ------------------------------------------------------------------------------------------

    DEBUG 16:20:52,273 IOUtils - Deleted /pine/scr/s/w/swan/svtoolkit/test/tmpdir/Q-Classes-4174555092816110705

     

    Any thoughts about this. Thank you.

     

     

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Tagging Bob Handsaker in this thread. 

    0
    Comment actions Permalink
  • Avatar
    Bob Handsaker

    Two comments:

    First, it looks like you are using an alt-free reference. Presumably you are doing this because you aligned your reads to this alt-free reference - if not, then you need to start over. You may run into a few problems along the way because this is not typical practice. Normally, we expect the standard (e.g. functionally equivalent) bwa alignment pipeline using the hg38 reference with alt contigs. The code in Genome STRiP is designed to handle the alt contigs correctly.

    As an aside, if you were using the standard practice and the Genome STRiP reference bundle, then the ploidy map should have been picked up automatically.

    Second, with respect to the last specific problem, I think you need to look in /pine/scr/s/w/swan/svtoolkit/test/preprocessing/logs/SVPreprocess-261.out to see what went wrong.

    0
    Comment actions Permalink
  • Avatar
    Swanthana Rekulapally

    Hello Bob, Thanks for replying. We are using bwa aligned samples and our reference is alt-free grch38. For the above problem I guess I have updated R and it ran good. I think it's working further to cnvpipeline too, but I got this below error 

    ERROR 12:02:49,006 FunctionEdge - Error:  'java'  '-Xmx2048m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/pine/scr/s/w/swan/svtoolkit/test/tmpdir'  '-cp' '/nas/longleaf/home/swan/svtoolkit/lib/SVToolkit.jar:/nas/longleaf/home/swan/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/nas/longleaf/home/swan/svtoolkit/lib/gatk/Queue.jar'  '-cp' '/nas/longleaf/home/swan/svtoolkit/lib/SVToolkit.jar:/nas/longleaf/home/swan/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/nas/longleaf/home/swan/svtoolkit/lib/gatk/Queue.jar'  'org.broadinstitute.sv.apps.ComputeGCProfiles'  '-R' '/proj/ncgenes2/src/ncgenes2-exome-pipeline/modules/apps/human-genome-for-alignment/1405.15/GRCh38_no_alt_analysis_set.refseqids.fna'  '-md' 'preprocessing/metadata'  '-writeReferenceProfile' 'true'  '-configFile' '/nas/longleaf/home/swan/svtoolkit/conf/genstrip_parameters.txt'  '-O' '/pine/scr/s/w/swan/svtoolkit/test/preprocessing/metadata/gcprofile/reference.gcprof.zip' 

    ERROR 12:02:49,016 FunctionEdge - Contents of /pine/scr/s/w/swan/svtoolkit/test/preprocessing/logs/SVPreprocess-6.out:

    ##### ERROR --

    ##### ERROR stack trace

    ##### ERROR ------------------------------------------------------------------------------------------

    ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.7.GS-r1941-0-gb493839):

    ##### ERROR

    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.

    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.

    ##### ERROR Visit our website and forum for extensive documentation and answers to

    ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk

    ##### ERROR

    ##### ERROR MESSAGE: Invalid ploidy map ploidy2.txt: Unrecognized sequence name: X

    ##### ERROR ------------------------------------------------------------------------------------------

    ERROR 13:39:46,372 FunctionEdge - Error:  'java'  '-Xmx2048m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/pine/scr/s/w/swan/svtoolkit/test/tmpdir'  '-cp' '/nas/longleaf/home/swan/svtoolkit/lib/SVToolkit.jar:/nas/longleaf/home/swan/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/nas/longleaf/home/swan/svtoolkit/lib/gatk/Queue.jar' org.broadinstitute.sv.main.SVCommandLine '-T' 'ComputeReadDepthCoverageWalker'  '-R' '/proj/ncgenes2/src/ncgenes2-exome-pipeline/modules/apps/human-genome-for-alignment/1405.15/GRCh38_no_alt_analysis_set.refseqids.fna'  '-I' '/proj/nvora/users/swan/60707-P-picard-sorted.bam'  '-O' '/pine/scr/s/w/swan/svtoolkit/test/preprocessing/metadata/depth/60707-P-picard-sorted.depth.txt'  '-disableGATKTraversal' 'true'  '-md' 'preprocessing/metadata'  '-ploidyMapFile' 'ploidy2.txt'  '-configFile' '/nas/longleaf/home/swan/svtoolkit/conf/genstrip_parameters.txt'  '-minMapQ' '10'  '-insertSizeRadius' '10.0' 

    ERROR 13:39:46,373 FunctionEdge - Contents of /pine/scr/s/w/swan/svtoolkit/test/preprocessing/logs/SVPreprocess-11.out:

    ##### ERROR --

    ##### ERROR stack trace

    ##### ERROR ------------------------------------------------------------------------------------------

    ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.7.GS-r1941-0-gb493839):

    ##### ERROR

    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.

    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.

    ##### ERROR Visit our website and forum for extensive documentation and answers to

    ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk

    ##### ERROR

    ##### ERROR MESSAGE: Invalid ploidy map ploidy2.txt: Unrecognized sequence name: X

    ##### ERROR ------------------------------------------------------------------------------------------

     

    Not sure why it's no recognizing the X chromosomes, here is my ploidy file 

    X 1 9999 M 1

    X 2781480 155701381 M 1

    Y 1 57227415 M 1

    Y 1 57227415 F 0

    MT 1 16569 M 1

    MT 1 16569 F 1

    chrX 1 9999 M 1

    chrX 2781480 155701381 M 1

    chrY 1 57227415 M 1

    chrY 1 57227415 F 0

    chrM 1 16569 M 1

    chrM 1 16569 F 1

    *  * *     M 2

    *  * *     F 2

    And I have checked SVPreprocess-6.out in the error, nothing wrong there. Here is end of that log file 

    INFO  11:11:01,293 ComputeGCProfiles - Reference GC profile initialized.

    INFO  11:11:01,293 CommandLineProgram - Program completed.

    ------------------------------------------------------------------------------------------

    Done. There were no warn messages.

     

    Let me know what else is going wrong here. Thanks for your time.

    0
    Comment actions Permalink
  • Avatar
    Bob Handsaker

    This error message:

    ##### ERROR MESSAGE: Invalid ploidy map ploidy2.txt: Unrecognized sequence name: X

    suggests that your references names the chromosomes "chrX", not "X". You need to clear out the unrecognized sequence names from the ploidy map file.

     

    0
    Comment actions Permalink
  • Avatar
    Swanthana Rekulapally

    Hello Bob,

    Thanks for the suggestion, Now i have faced with another error

    ERROR 12:09:00,770 FunctionEdge - Error:  'java'  '-Xmx2048m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/pine/scr/s/w/swan/svtoolkit/test/tmpdir'  '-cp' '/nas/longleaf/home/swan/svtoolkit/lib/SVToolkit.jar:/nas/longleaf/home/swan/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/nas/longleaf/home/swan/svtoolkit/lib/gatk/Queue.jar'  '-cp' '/nas/longleaf/home/swan/svtoolkit/lib/SVToolkit.jar:/nas/longleaf/home/swan/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/nas/longleaf/home/swan/svtoolkit/lib/gatk/Queue.jar'  'org.broadinstitute.sv.apps.ComputeDepthProfiles'  '-I' 'preprocessing/metadata/headers.bam'  '-configFile' '/nas/longleaf/home/swan/svtoolkit/conf/genstrip_parameters.txt'  '-R' '/proj/ncgenes2/src/ncgenes2-exome-pipeline/modules/apps/human-genome-for-alignment/1405.15/GRCh38_no_alt_analysis_set.refseqids.fna'  '-L' 'NT_187404.1:0-0'  '-md' 'preprocessing/metadata'  '-profileBinSize' '100000'  '-maximumReferenceGapLength' '10000'  '-O' '/pine/scr/s/w/swan/svtoolkit/test/preprocessing/metadata/profiles_100Kb/profile_seq_NT_187404.1_100000.dat.gz' 

    ERROR 12:09:00,771 FunctionEdge - Contents of /pine/scr/s/w/swan/svtoolkit/test/preprocessing/logs/SVPreprocess-171.out:

    I have checked SVPreprocess-171.out, its not even pointing to any error in there 

    INFO  11:56:10,488 HelpFormatter - -------------------------------------------------------------

    INFO  11:56:10,494 HelpFormatter - Program Name: org.broadinstitute.sv.apps.ComputeDepthProfiles

    INFO  11:56:10,498 HelpFormatter - Program Args: -I preprocessing/metadata/headers.bam -configFile /nas/longleaf/home/swan/svtoolkit/conf/genstrip_parameters.txt -R /proj/ncgenes2/src/ncgenes2-exome-pipeline/modules/apps/human-genome-for-alignment/1405.15/GRCh38_no_alt_analysis_set.refseqids.fna -L NT_187404.1:0-0 -md preprocessing/metadata -profileBinSize 100000 -maximumReferenceGapLength 10000 -O /pine/scr/s/w/swan/svtoolkit/test/preprocessing/metadata/profiles_100Kb/profile_seq_NT_187404.1_100000.dat.gz

    INFO  11:56:10,502 HelpFormatter - Executing as swan@c0316.ll.unc.edu on Linux 3.10.0-1062.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_222-b10.

    INFO  11:56:10,503 HelpFormatter - Date/Time: 2020/06/16 11:56:10

    INFO  11:56:10,503 HelpFormatter - -------------------------------------------------------------

    INFO  11:56:10,503 HelpFormatter - -------------------------------------------------------------

    INFO  11:56:10,518 ComputeDepthProfiles - Opening reference sequence ...

    INFO  11:56:10,526 ComputeDepthProfiles - Opened reference sequence.

    INFO  11:56:10,528 MetaData - Opening metadata ... 

    INFO  11:56:10,528 MetaData - Adding metadata location preprocessing/metadata ...

    INFO  11:56:10,535 MetaData - Opened metadata.

    INFO  11:56:10,536 ComputeDepthProfiles - Opened metadata.

    INFO  11:56:10,536 ComputeDepthProfiles - Initializing input data set ...

    INFO  11:56:10,672 ComputeDepthProfiles - Initialized data set: 1 file, 1 read group, 1 sample.

    INFO  11:56:10,708 ReadCountCache - Initializing read count cache with 1 file.

    INFO  11:56:10,808 CommandLineProgram - Program completed.

    ------------------------------------------------------------------------------------------

    Done. There were no warn messages.

     

    I guess this "NT_187404.1:0-0" should have some problem, but its in the genome file. I am not sure what to do with it. And also can you please tell me how to increase bin size 

     

    Thanks for the patience

    0
    Comment actions Permalink
  • Avatar
    Bob Handsaker

    Sometimes, Queue will think a job has failed when it has run successfully. The first thing I would try is to rerun Queue and see if the failure is reproducible. Queue may retry this job and maybe it will run fine.

    If the failure is reproducible, then I would try to run the command outside of Queue to see if there is some error message that you have missed or if the command is returning a non-zero exit status.

    You should not try to change the bin size on these jobs. This is doing some high-level characterization of read depth that can be used for QC and is also used for sex determination. If you want depth profiles with different bin sizes, you can compute them after preprocessing is done.

    0
    Comment actions Permalink
  • Avatar
    Swanthana Rekulapally

    This helps, so i ran it again and it run but stopped again in cnv pipeline step, here is my script let me know is there anything to add to it

    ## Preprocessing

     

    mkdir -p preprocessing || exit 1

    mkdir -p preprocessing/logs || exit 1

    mkdir -p preprocessing/metadata || exit 1

     

    java -cp ${classpath} ${mx} org.broadinstitute.gatk.queue.QCommandLine \

            -S ${SV_DIR}/qscript/SVPreprocess.q \

            -S ${SV_DIR}/qscript/SVQScript.q \

            -cp ${classpath} \

            -jobNative '-V ${SV_DIR}' \

            -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \

            -configFile ${SV_DIR}/conf/genstrip_parameters.txt \

            -tempDir ${SV_TMPDIR} \

            -R ${reference_genome} \

            -I ${inputFile} \

            -ploidyMapFile ploidymap.txt \

            -md preprocessing/metadata \

            -useMultiStep \

            -disableGATKTraversal \

            -bamFilesAreDisjoint true \

            -computeGCProfiles true \

            -reduceInsertSizeDistributions false \

            -computeReadCounts true \

            -jobLogDir preprocessing/logs \

            -l DEBUG \

            -run \

            || exit 1

    ## CNV discovery

     

    mkdir -p cnv_pipeline || exit 1

    mkdir -p cnv_pipeline/logs || exit 1

    mkdir -p cnv_pipeline/metadata || exit 1

     

    java -cp ${classpath} ${mx} org.broadinstitute.gatk.queue.QCommandLine \

            -S ${SV_DIR}/qscript/discovery/cnv/CNVDiscoveryPipeline.q \

            -S ${SV_DIR}/qscript/SVQScript.q \

            -cp ${classpath}

            -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \

            -configFile ${SV_DIR}/conf/genstrip_parameters.txt \

            -disableJobReport \

            -I ${inputFile} \

            -R ${reference_genome} \

            -runDirectory cnv_pipeline \

            -md preprocessing/metadata \

            -reduceInsertSizeDistributions true \

            -jobLogDir ${runDir}/logs \

            -ploidyMapFile ploidymap.txt \

            -tempDir ${SV_TMPDIR} \

            -maximumReferenceGapLength 1000 \

            -boundaryPrecision 100 \

            -minimumRefinedLength 500 \

            -tilingWindowSize 1000 \

            -tilingWindowOverlap 500 \

            -debug true --verbose true \

            -run || exit 1

     

    ## Genotyping CNVs

    mkdir -p cnv_genotype || exit 1

    mkdir -p cnv_genotype/logs || exit 1

     

    java -cp ${classpath} ${mx} org.broadinstitute.gatk.queue.QCommandLine \

            -S ${SV_DIR}/qscript/SVGenotyper.q \

            -S ${SV_DIR}/qscript/SVQScript.q \

            -cp ${classpath} \

            -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \

            -configFile ${SV_DIR}/conf/genstrip_parameters.txt \

            -R ${reference_genome} \

            -I ${inputFile} \

            -vcf cnv_pipeline/*.sites.vcf \

            -md preprocessing/metadata \

            -runDirectory cnv_genotype \

            -jobLogDir cnv_genotype/logs \

            -O cnv_pipeline/${project}_final.genotypes.vcf \

            -parallelRecords 100 \

            -debug true --verbose true

            -run || exit

    Here is the error i got 

    ##### ERROR --

    ##### ERROR stack trace

    ##### ERROR ------------------------------------------------------------------------------------------

    ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.7.GS-r1941-0-gb493839):

    ##### ERROR

    ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.

    ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.

    ##### ERROR Visit our website and forum for extensive documentation and answers to

    ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk

    ##### ERROR

    ##### ERROR MESSAGE: Argument with name '--referencefile' (-R) is missing.

    ##### ERROR Argument with name '--tilingwindowsize' (-tilingWindowSize) is missing.

    ##### ERROR Argument with name '--minimumrefinedlength' (-minimumRefinedLength) is missing.

    ##### ERROR Argument with name '--maximumreferencegaplength' (-maximumReferenceGapLength) is missing.

    ##### ERROR Argument with name '--gatkjar' (-gatk) is missing.

    ##### ERROR Argument with name '--tilingwindowoverlap' (-tilingWindowOverlap) is missing.

    ##### ERROR Argument with name '--boundaryprecision' (-boundaryPrecision) is missing.

    ##### ERROR ------------------------------------------------------------------------------------------

     

    Thank you.

    0
    Comment actions Permalink
  • Avatar
    Bob Handsaker

    Perhaps you found this already, but perhaps a stray space (after a backslash) in the script?

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk