I have been running GermlineCNVCaller on targeted sequencing (mainly exons) data following the How to (https://gatk.broadinstitute.org/hc/en-us/articles/360035531152--How-to-Call-common-and-rare-germline-copy-number-variants), but for several of the exons of interest, I get a COUNT of 0 after the command CollectReadCounts.
When looking at the reads for these regions, they have MAPQ 0, likely due to reads mapping to multiple locations (I'm using bwa mem for mapping).
One way to circumvent exclusion of multiple mapped reads is to set --minimum-mapping-quality 0 in the CollectReadCounts step, but that may of course introduce a bias in the COUNT.
Is there a general recommendation on how to handle these regions/exons where reads map to multiple regions, e.g. due do highly similar paralogous genes? For the genes in the current analysis, roughly half of the exons have a COUNT of 0 from CollectReadCounts.
I can see that the issue with multiple mapping is discussed to a high extent for RNA-seq, but I haven't found any discussions related to CNV analysis.
I'm using GATK version 188.8.131.52.
Please sign in to leave a comment.