Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Genomestrip CNVdiscoverypipeline failed running paralleled

0

11 comments

  • 0
    Comment actions Permalink
  • Avatar
    Bob Handsaker

    You have correctly diagnosed the problem, I believe.

    If you cannot run on a set of hosts where the execution hosts are also submit hosts, then the best alternative would be to try to run the top-level Queue script with -jobRunner ParallelShell. This will cause all of the top-level Queue jobs to run on the same host, so you will need to create a large reservation for this job. You can limit the number of parallel shell jobs using -maxConcurrentRun. This will somewhat reduce overall parallelism, but depending on the size of your data set, you may be able to get it to run that way.

    As a side note, -jobRunner Shell seems to be a little flaky, so we generally recommend using -jobRunner ParallelShell with -maxConcurrentRun 1 in preference to using -jobRunner Shell.

     

    1
    Comment actions Permalink
  • Avatar
    lizhichao

    thanks for your reply ,I'm testing under the argument. In adiition,I want to detect deletion and genotyping, so should i run cnvdiscovery pipeline or SV discovery+sv genotyper, both is ok? and when i run cnvdiscovery ,should i run sv_genotyper after the cnvdiscovery?

    0
    Comment actions Permalink
  • Avatar
    Bob Handsaker

    The workflow is generally SVPreprocess followed by one or both of SVDiscovery + SVGenotyper (for deletions only) or CNVDiscovery for CNVs, which also genotypes as part of the discovery pipeline. You can then run SVGenotyper in additional samples if you want (or to get uniform genotyping if you are running in batches).

    There is also an LCNV (large CNV) pipeline which is designed to find "microarray resolution" CNVs and will also find mosaic CNVs. The output is not in vcf format, however.

    0
    Comment actions Permalink
  • Avatar
    lizhichao

    Thanks,So ,if i want to focus on the deletion genotyping, should i run the SVDiscovery + SVGenotyper instead of CNVDiscovery?  SVDiscovery + SVGenotyper  seems to be faster than CNVDiscover.

    SVDiscovery + SVGenotyper is not designed by parallel running? and i  tested  it successfully before. 

     

    0
    Comment actions Permalink
  • Avatar
    Bob Handsaker

    For deletions only, you will get most of them with SVDiscovery + SVGenotyper. You will miss things that can be found only with read depth due to repetitive sequences.

    SVDiscovery / SVGenotyper are much faster. They are parallelized, but don't recursively parallelize, so the execution hosts do not need to be submit hosts. That is only done in the CNVDiscovery pipeline.

    0
    Comment actions Permalink
  • Avatar
    lizhichao

    Thanks , if i want to study the deletion  mutation of CNV, what pipeline should i select?

    0
    Comment actions Permalink
  • Avatar
    Bob Handsaker

    I think you would have to explain the analysis you want to do in more detail. Feel free to write to me directly if that would be easier.

    0
    Comment actions Permalink
  • Avatar
    lizhichao

    I just want to study the homozygous deletion and heterozygous deletion of CNV genotyping in population,to find some assosication.

    0
    Comment actions Permalink
  • Avatar
    Bob Handsaker

    The reason we wrote two methods / pipelines is because they detect different but sometimes overlapping sets of variants. So for best sensitivity, you should run both.

    0
    Comment actions Permalink
  • Avatar
    lizhichao

    Thanks,I get what you means

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk