Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Generating ACS like output from new GATK CNV pipeline (WGS)



  • Avatar
    Jason Cerrato

    Hi Jett,

    Thank you for your inquiry. Let me speak to my colleagues to find out if anyone knows the answer to your questions.

    Kind regards,


    Comment actions Permalink
  • Avatar

    That workflow link is old, if you would like to use the latest version of the workflow you can obtain it from the gatk git repo and if you would like to run the workflow in a Terra you can import the workflow in to a workspace from the Dockstore repo

    I'm not familiar with ABSOLUTE or DeTiN but I can list down what outputs are produced by the ModelSegment tool: 

    File het_allelic_counts = "~{output_dir_}/~{entity_id}.hets.tsv"
    File normal_het_allelic_counts = "~{output_dir_}/~{entity_id}.hets.normal.tsv"
    File copy_ratio_only_segments = "~{output_dir_}/~{entity_id}.cr.seg"
    File copy_ratio_legacy_segments = "~{output_dir_}/~{entity_id}.cr.igv.seg"
    File allele_fraction_legacy_segments = "~{output_dir_}/~{entity_id}.af.igv.seg"
    File modeled_segments_begin = "~{output_dir_}/~{entity_id}.modelBegin.seg"
    File copy_ratio_parameters_begin = "~{output_dir_}/~{entity_id}"
    File allele_fraction_parameters_begin = "~{output_dir_}/~{entity_id}"
    File modeled_segments = "~{output_dir_}/~{entity_id}.modelFinal.seg"
    File copy_ratio_parameters = "~{output_dir_}/~{entity_id}"
    File allele_fraction_parameters = "~{output_dir_}/~{entity_id}"


    Comment actions Permalink
  • Avatar
    Jett Crowdis

    Thanks for the quick reply - I'll use the updated workflow.

    I'm familiar with the outputs, but my question was more about how the new outputs relate to the old paradigm used in the GATK CNV toolkit with AllelicCNV. According to the github link I posted above, ModelSegments is intended to replace Allelic CNV, but it doesn't generate the same outputs - in particular, it doesn't generate segmentation data that gives the copy ratios for each allele of a segment separately. This is the file that is required by ABSOLUTE/DeTiN and was previously generated by AllelicCNV (the *-sim-final.acs.seg - I've posted an example below).

    Chromosome Start.bp End.bp n_probes length n_hets f tau sigma.tau mu.minor sigma.minor mu.major sigma.major SegLabelCNLOH
    1 12176 14057554 1558 14045378 103 0.324218442 1.438292376 0.008519298 0.466320913 0.01236671 0.971971463 0.01236671 2
    1 14059326 145367768 7572 131308442 529 0.499978943 1.939948064 0.005208241 0.969933184 0.003385049 0.970014881 0.003385049 2

    There are no outputs of the new workflow that look like this, so my plan was to simply run AllelicCNV using inputs from the new workflow. AllelicCNV requires three inputs - here's where I think they now come from using the new workflow:

    Name of file Old GATK CNV (task) New GATK CNV (task)
    tumorHets *.tumor.hets.tsv (GetHetCoverage) *.hets.tsv (ModelSegments)
    segments *.seg (PerformSegmentation) *.modelFinal.seg (ModelSegments)
    tangentNormalized *.tn.tsv (NormalizeSomaticReadCounts) ?????

    I'm mostly confused as to what the input for tangentNormalized. I assume it comes from DenoiseReadCounts in the new workflow, but I wanted to make sure.

    Comment actions Permalink
  • Avatar
    Samuel Lee

    Hi Jett Crowdis,

    Thanks for your question. I've tried to answer it here:

    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk