What is the difference between "CNVSomaticPairWorkflow.common_sites" and "CNVSomaticPairWorkflow.intervals"?
AnsweredHi,
I'm trying to execute gatk somatic cnv workflow using human exome data. And I made PON and intervals files from normal bams or reference fasta file that used at this analysis. When I checked cnv_somatic_pair_workflow.input.json, I found two interval_list file should be set: "CNVSomaticPairWorkflow.intervals" and "CNVSomaticPairWorkflow.common_sites". What's the difference among the two interval_list files?
I found a description about the two files (https://app.terra.bio/#workspaces/help-gatk/Somatic-CNVs-GATK4).
CNVSomaticPairWorkflow.common_sites:
Picard- or GATK-style interval list of common sites to use for collecting allelic counts. |
CNVSomaticPairWorkflow.intervals: Picard or GATK-style interval list. For WGS, this should typically only include the autosomal chromosomes.
Based on the above, CNVSomaticPairWorkflow.intervals means interval_files with only autosome? So, should I set interval_files with only autosome as an argument of "CNVSomaticPairWorkflow.intervals"?
-
Hi Hiro Ama
I am going to move your post into our Community Discussions -> Documentation Questions topic, as the Somatic topic is for reporting bugs and issues with GATK.
You can read more about our forum guidelines and the topics here: Forum Guidelines.
Best,
Genevieve
-
These two files have different purposes and are used in different steps..
The CNVSomaticPairWorkflow.intervals is an interval list for where you will restrict your analysis, so for your exome data this would be your target regions. It is used in the CollectFragmentCounts step. The CNVSomaticPairWorkflow.common_sites is used in the analysis for where to collect allelic counts with the CollectAllelicCounts step.
You can read more about these steps in our GATK documentation in these tutorials:
Please sign in to leave a comment.
2 comments