Generating ACS like output from new GATK CNV pipeline (WGS)
Hi GATK team,
Our lab has been in the process of exploring CNV calling methods for WGS. We've found that GATK4's somatic CNV calling pipeline (following the tutorial outlined here) does a great job with segmentation after some adjustments. This is the pipeline we are currently using: https://portal.firecloud.org/?return=terra#methods/gatk/CNV_Somatic_Pair_Workflow/7. We're using the latest version of GATK (4.1.7.0)
I have a few questions related to the outputs of this workflow. In particular, we'd like to get outputs that can be ingested by ABSOLUTE and by DeTiN. These were previously generated by AllelicCNV or AllelicCapseg.
Previously, GATK's AllelicCNV tool generated an ACS-like file (-sim-final.acs.seg), which contained allelic copy ratio data and could be used by ABSOLUTE and DeTiN. Judging by this GitHub discussion, ModelSegments is supposed to replace AllelicCNV, but it does not produce a file like *-sim-final.acs.seg - only a file that contains total copy ratio and minor allele frequencies.
What is the best way to generate an allelic segmentation file that can be used for ABSOLUTE and DeTiN? I had planned to simply run the older version of AllelicCNV (or AllelicCapseg), but A) I don't want to use old tools if something new is available, and B) I'm not sure where to get the inputs for these tools using the updated workflow. In particular, AllelicCNV and AllelicCapseg require the tangent-normalized read counts (previously output by NormalizeSomaticReadCount as the .tn.tsv file). Is the corresponding output in the new workflow the .denoisedCT.tsv file generated by DenoiseReadCounts? The other two inputs (the tumor het results and the seg file) presumably come from ModelSegments (.hets.tsv and .modelFinal.seg)
Thanks for any help!
Jett
-
Hi Jett,
Thank you for your inquiry. Let me speak to my colleagues to find out if anyone knows the answer to your questions.
Kind regards,
Jason
-
That workflow link is old, if you would like to use the latest version of the workflow you can obtain it from the gatk git repo and if you would like to run the workflow in a Terra you can import the workflow in to a workspace from the Dockstore repo.
I'm not familiar with ABSOLUTE or DeTiN but I can list down what outputs are produced by the ModelSegment tool:
File het_allelic_counts = "~{output_dir_}/~{entity_id}.hets.tsv"
File normal_het_allelic_counts = "~{output_dir_}/~{entity_id}.hets.normal.tsv"
File copy_ratio_only_segments = "~{output_dir_}/~{entity_id}.cr.seg"
File copy_ratio_legacy_segments = "~{output_dir_}/~{entity_id}.cr.igv.seg"
File allele_fraction_legacy_segments = "~{output_dir_}/~{entity_id}.af.igv.seg"
File modeled_segments_begin = "~{output_dir_}/~{entity_id}.modelBegin.seg"
File copy_ratio_parameters_begin = "~{output_dir_}/~{entity_id}.modelBegin.cr.param"
File allele_fraction_parameters_begin = "~{output_dir_}/~{entity_id}.modelBegin.af.param"
File modeled_segments = "~{output_dir_}/~{entity_id}.modelFinal.seg"
File copy_ratio_parameters = "~{output_dir_}/~{entity_id}.modelFinal.cr.param"
File allele_fraction_parameters = "~{output_dir_}/~{entity_id}.modelFinal.af.param" -
Thanks for the quick reply - I'll use the updated workflow.
I'm familiar with the outputs, but my question was more about how the new outputs relate to the old paradigm used in the GATK CNV toolkit with AllelicCNV. According to the github link I posted above, ModelSegments is intended to replace Allelic CNV, but it doesn't generate the same outputs - in particular, it doesn't generate segmentation data that gives the copy ratios for each allele of a segment separately. This is the file that is required by ABSOLUTE/DeTiN and was previously generated by AllelicCNV (the *-sim-final.acs.seg - I've posted an example below).
Chromosome Start.bp End.bp n_probes length n_hets f tau sigma.tau mu.minor sigma.minor mu.major sigma.major SegLabelCNLOH 1 12176 14057554 1558 14045378 103 0.324218442 1.438292376 0.008519298 0.466320913 0.01236671 0.971971463 0.01236671 2 1 14059326 145367768 7572 131308442 529 0.499978943 1.939948064 0.005208241 0.969933184 0.003385049 0.970014881 0.003385049 2 There are no outputs of the new workflow that look like this, so my plan was to simply run AllelicCNV using inputs from the new workflow. AllelicCNV requires three inputs - here's where I think they now come from using the new workflow:
Name of file Old GATK CNV (task) New GATK CNV (task) tumorHets *.tumor.hets.tsv (GetHetCoverage) *.hets.tsv (ModelSegments) segments *.seg (PerformSegmentation) *.modelFinal.seg (ModelSegments) tangentNormalized *.tn.tsv (NormalizeSomaticReadCounts) ????? I'm mostly confused as to what the input for tangentNormalized. I assume it comes from DenoiseReadCounts in the new workflow, but I wanted to make sure.
-
Hi Jett Crowdis,
Thanks for your question. I've tried to answer it here: https://github.com/broadinstitute/gatk/issues/6685
Please sign in to leave a comment.
4 comments