Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Calling somatic CNV over multiple intervals and join them subsequently

0

4 comments

  • Avatar
    Gökalp Çelik

    Hi Daniel

    You can still collect read counts at once but scatter CNV calls to different parts of the original intervals. Final segmentation will collect all scatters and generate a single call for each sample. 

    For combining read count files per sample if you collect them in tsv format this could be done by scripting your way out however hdf5 files won't be as easy as tsv files. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Daniel

    Hi Gökalp,

    Thank you for your help!

    I have two followup question.

    When you talk about scattering CNV calls, how would one go about this - outside the Terra/Cromwell universe.

    •  I did not see any thread parameters I could use - so how is it scattering?
    • Or is the scattering done via a split interval list and multiple calls - then I do not get how I would do the Final segmentation.

    The second question is a follow up on combining/concatinating the tsv files.

    • These can be lossless concatinated?
    • Do I need to take care of the headers in these files?
    0
    Comment actions Permalink
  • Avatar
    Daniel

    Or did I miss understood and you reffered to the final segmentation as the 

    ModelSegments

    function, which you then give all the scatters for your sample in a pseudo multi-sample mode?

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Daniel

    Please ignore my previous comment. My thoughts got sidetracked due to Germline CNV  workflow scattering. 

    Unfortunately we do not have any scattering option available for Somatic CNV calling. Somatic CNV calling workflow is not too resource intensive in terms of run duration and memory and cpu requirements like Germline CNV workflow so all should be able to complete in a single run. 

    As for collecting readcounts in split intervals and combining them, we don't have a ready tool for that and you may need to script your way through it. Header section is necessary for the tsv output so keep in mind that it has to be intact and should contain all the sequence dictionary inside. If you still wish to collect read counts using split intervals make sure that splits do not have any overlaps therefore your read counts don't get confusingly hard to combine. 

    I hope this helps. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk