Mulitple mapped reads in germline CNV caller (GermlineCNVCaller)
AnsweredI have been running GermlineCNVCaller on targeted sequencing (mainly exons) data following the How to (https://gatk.broadinstitute.org/hc/en-us/articles/360035531152--How-to-Call-common-and-rare-germline-copy-number-variants), but for several of the exons of interest, I get a COUNT of 0 after the command CollectReadCounts.
When looking at the reads for these regions, they have MAPQ 0, likely due to reads mapping to multiple locations (I'm using bwa mem for mapping).
One way to circumvent exclusion of multiple mapped reads is to set --minimum-mapping-quality 0 in the CollectReadCounts step, but that may of course introduce a bias in the COUNT.
Is there a general recommendation on how to handle these regions/exons where reads map to multiple regions, e.g. due do highly similar paralogous genes? For the genes in the current analysis, roughly half of the exons have a COUNT of 0 from CollectReadCounts.
I can see that the issue with multiple mapping is discussed to a high extent for RNA-seq, but I haven't found any discussions related to CNV analysis.
I'm using GATK version 4.1.4.1.
-
Hi chrl, here are some links that may provide you some background information and workarounds:
- https://gatk.broadinstitute.org/hc/en-us/articles/360039568932--How-to-Map-and-clean-up-short-read-sequence-data-efficiently
- https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant.
The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we cannot guarantee a solution. For context, check out our support policy.
We ask other community members to post their own experience if you know how to get around this issue.
Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.
Please sign in to leave a comment.
1 comment