Somatic CNV Workflow Tutorials
REQUIRED for all errors and issues:
a) GATK version used: docker image: tutorial_11682_11683:gatk4.0.1.1
b) Exact command used:
gatk --java-options "-Xmx3g" CollectAllelicCounts \
-I sample.bam \ -R reference.fa \ -L sites.interval_list \ -O sample.allelicCounts.tsv
I was able to reproduce the results following tutorials 11682 and 11683. However, when I use my own data, I am unsure where and how I can obtain the required input `sites.interval_list`. The region that I am interested is the whole genome, what should I use as the input of this interval list?
If possible, could you also direct me to the resources I can use to understand the copy ratio plots results of somatic CNV? The germline CNV workflow resulted in some VCF, which makes me wonder if there are variants specific results produced by somatic CNV workflow.
Thank you very much!
Sincerely,
Yuwei
-
Hi Yuwei Bao,
The sites file can be specific to your data (e.g. from HaplotypeCaller results) or based on common variation like the 1000 Genomes SNP file from the VQSR resource bundle: https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle
As for the plots, they will be more informative once you get to the PlotModeledSegments part of the workflow: https://gatk.broadinstitute.org/hc/en-us/articles/360035890011#8 By then a lot of the noise will be smoothed out and the event boundaries will be more clear. The remaining calls will also be more confident, while the copy ratios that do not have enough evidence to be called as amplifications will be called as copy neutral.
The Germline workflow does output VCF files, which is not the case for somatic. Traditionally that's been a difference in the file types preferred by the two different fields.
Please sign in to leave a comment.
1 comment