Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

REPOST - GenomeSTRiP error: file missing at Stage 12



  • Avatar
    Derek Caetano-Anolles

    Thank you for your question, ikeoluwao_o. We're working on figuring out your issue and will get back to you soon.

    Comment actions Permalink
  • Avatar
    Derek Caetano-Anolles

    Thanks for waiting, ikeoluwao_o! This isn't GATK so I'm not sure how much help I can be, but it is clear that there is a file that is not being found due to the error you are receiving here: /mnt/data/ike/projects/Combined_NBU/Analysis6/Data_Files/CNV/cnv_stage10/seq_6/ (No such file or directory)

    So, it may be related to an issue in your directory somewhere. I have asked one of our computational biologists who is knowledgeable in this area and they may have a better solution an I have.

    Comment actions Permalink
  • Avatar
    Chris Whelan

    @ikeoluwao_o I'm not sure why you'd be getting that error. Could you please list the contents of


    and the same for:


    Did all the other stages of the pipeline (prior to stage 12) complete successfully? 

    Comment actions Permalink
  • Avatar


    Thank you for your response. Yes, all the other stages prior to stage 12 completed successfully. 

    I mistakenly overrode the contents of the file you're requesting by re-running the script. The error persists and is now ocurring in seq_21 instead of seq_6. The error now looks like this: /mnt/data/ike/projects/Combined_NBU/Analysis6/Data_Files/CNV/cnv_stage10/seq_21/ (No such file or directory)

    Here are the contents of the directory: 

    -rw-rw-r-- 1 adeshina zfs1fs1  349 Jan 18 22:50 seq_21.adjacent_merged.genotypes.gts.dat

    -rw-rw-r-- 1 adeshina zfs1fs1  349 Jan 18 22:50 seq_21.adjacent_merged.genotypes.conf.dat

    -rw-rw-r-- 1 adeshina zfs1fs1   28 Jan 18 22:50

    -rw-rw-r-- 1 adeshina zfs1fs1  307 Jan 18 22:50 seq_21.adjacent_merged.genotypes.counts.dat

    -rw-rw-r-- 1 adeshina zfs1fs1  307 Jan 18 22:50 seq_21.adjacent_merged.genotypes.expected.dat

    -rw-rw-r-- 1 adeshina zfs1fs1   60 Jan 18 22:50 seq_21.adjacent_merged.genotypes.gmm.dat

    -rw-rw-r-- 1 adeshina zfs1fs1 4.0K Jan 18 22:51 seq_21.adjacent_merged.genotypes.vcf.gz

    -rw-rw-r-- 1 adeshina zfs1fs1   72 Jan 18 22:51 seq_21.adjacent_merged.genotypes.vcf.gz.tbi

    drwxrwsr-x 2 adeshina zfs1fs1 4.0K Jan 18 22:53 eval

    drwxrwsr-x 2 adeshina zfs1fs1  306 Jan 18 22:53 logs


    Here are the contents of the second directory you requested: 

    -rw-rw-r-- 1 adeshina zfs1fs1 604K Jan 18 21:29 seq_12.adjacent_merged.genotypes.conf.dat

    -rw-rw-r-- 1 adeshina zfs1fs1 529K Jan 18 21:29 seq_12.adjacent_merged.genotypes.counts.dat

    -rw-rw-r-- 1 adeshina zfs1fs1 1.6M Jan 18 21:29 seq_12.adjacent_merged.genotypes.expected.dat

    -rw-rw-r-- 1 adeshina zfs1fs1 687K Jan 18 21:29 seq_12.adjacent_merged.genotypes.gmm.dat

    -rw-rw-r-- 1 adeshina zfs1fs1 329K Jan 18 21:29 seq_12.adjacent_merged.genotypes.gts.dat

    -rw-rw-r-- 1 adeshina zfs1fs1 128K Jan 18 21:29

    -rw-rw-r-- 1 adeshina zfs1fs1 2.2M Jan 18 21:47 seq_12.adjacent_merged.genotypes.vcf.gz

    -rw-rw-r-- 1 adeshina zfs1fs1  22K Jan 18 21:47 seq_12.adjacent_merged.genotypes.vcf.gz.tbi

    drwxrwsr-x 2 adeshina zfs1fs1 4.0K Jan 18 21:48 eval

    drwxrwsr-x 2 adeshina zfs1fs1 4.0K Jan 18 21:48 logs
    Comment actions Permalink
  • Avatar
    Bob Handsaker

    Hi, there,

    I'm not sure why the files are no longer there. I can suggest a couple of workarounds.

    First, and probably easiest, the code that is trying to access these files is a legacy feature that is no longer needed. In the Genome STRiP distribution, there is a file qscript/discovery/cnv/CNVDiscoveryStage12.q. Lines 76-81 should look like this:

    // Temporary:We create a paritition map to the stage10 output directory.
    // This is a temporary measure to allow plotting of genotyped sites (via PlotGenotypingResults).
    // We should adapt the plotting code to work off of the output VCF directly.
    val stage10RunDir = new File(runDirectory.getParent(), "cnv_stage10")
    val partitionMap = new File(resultsOutputDir, "")
    createPartitionMapFile(stage10RunDir, partitionMap)

    If you comment the last three lines out, then it should run to completion and no harm done (PlotGenotypingResults has in fact been updated to work off of the vcf directly).

    The alternative workaround would be to recreate these files yourself. They are simple tab-delimited files. You could probably get by with stub files containing just a tab delimited header like so:

    CNP     PARTITION       CHR     START   END

    because I believe they are no longer used downstream.

    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk