How to choose interval_list?
Hi~I have found three main interval_list in https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0;tab=objects?prefix=&forceOnObjectsSortingFiltering=false
which are:
(1)wgs_calling_regions.hg38.interval_list
(2)wgs_coverage_regions.hg38.interval_list
(3)wgs_evaluation_regions.hg38.interval_list
I think " (1)wgs_calling_regions.hg38.interval_list " should be used in Mutect2 and
HaplotypeCaller,because they are calling process.QUESTION 1:Did it right?
QUESTION 2:When I run GenomicsDBImport which one I should choose?
a) GATK version used:gatk-4.2.6.1
b) Exact command used:
bsub -M 64G -q ser -n 16 -R "span[ptile=16]" -e GenomicsDBImport.err -o GenomicsDBImport.log gatk --java-options "-Xmx4g -Xms4g" GenomicsDBImport \
-V /scratch/2022-09-05/med-reny/germlinemutation/dna_aj_001/dna_aj_001.g.vcf.gz \
-V /scratch/2022-09-05/med-reny/germlinemutation/dna_aj_004/dna_aj_004.g.vcf.gz \
-V /scratch/2022-09-05/med-reny/germlinemutation/dna_aj_005/dna_aj_005.g.vcf.gz \
-V /scratch/2022-09-05/med-reny/germlinemutation/dna_aj_007/dna_aj_007.g.vcf.gz \
-V /scratch/2022-09-05/med-reny/germlinemutation/dna_aj_008/dna_aj_008.g.vcf.gz \
--genomicsdb-workspace-path my_database \
--tmp-dir /scratch/2022-09-05/med-reny/germlinemutation/joint \
-L /work/med-reny/ref/gatk/genomics-hg38/wgs_calling_regions.hg38.interval_list
-
Thank you for your post, Yi Ren! I want to let you know we have received your question and will be moving it to the Community Discussions -> General Discussion topic, as the Germline topic is for reporting bugs and issues with GATK.
We'll get back to you if we have any updates or follow up questions. Please see our Support Policy for more details about how we prioritize responding to questions.
-
Hi Genevieve Brandt (she/her),
Do you have an answer for this please?
I can't find documentation anywhere what the difference is between the wgs_coverage_regions.hg38.interval_list and wgs_evaluation_regions.hg38.interval_list.
Do you recommend one for GenomicsDBImport??
-
Hi Sheryl,
The WGS evaluation regions are typically used internally to assess coverage and a variety of other sample quality metrics and are chosen to minimize variability between samples. For example, I believe that chrX may be entirely excluded from the evaluation regions so that we don't see systematic biases in mean coverage between males and females (i.e. chrX ploidy 1 versus ploidy 2). The exome version of the evaluation regions probably excludes target padding. You should use the wgs_calling_regions.hg38.interval_list for single-sample and joint calling.
-
Thanks Laura Gauthier,
Could you just confirm for me which I should be using to get the median of the coverage over the autosome for the mitochondrial workflow?
I assume it's just a case of using the correct interval list with the CollectWgsMetrics tool?
-
Sorry - I meant using either:
wgs_coverage_regions.hg38.interval_list
or
wgs_evaluation_regions.hg38.interval_list
-
I don't think it will make a substantial difference for the mitochondria workflow between those two interval lists, but we currently use the wgs_coverage_regions.hg38.interval_list with CollectWgsMetrics to get the autosomal median coverage.
Please sign in to leave a comment.
6 comments