Mutect2 bamout reports fewer reference reads than original bam, causing false positive call.
If you are seeing an error, please provide(REQUIRED) :
a) GATK version used: gatk-4.1.1.0
b) Exact command used:
date;/GATK/jre1.8.0_151/bin/java -jar /GATK/gatk-4.1.1.0/gatk-package-4.1.1.0-local.jar Mutect2 -R /reference/hg19.fasta -I Sample.bam -tumor Sample -O sample.vcf.gz --max-population-af 1 --max-reads-per-alignment-start 0 -L hotspot.bed --recover-all-dangling-branches -bamout Sample-bamout.bam
If not an error, choose a category for your question(REQUIRED):
c) Why do I see that there were fewer reference reads in bamout than input bam after performing Mutect2 variant calling?
Look forward to your reply soon. Thank you!
Respectfully yours,
Ching-Yuan Wang
-
Hi CYWang,
Seeing fewer reads in your bamout file is somewhat expected because some reads will inevitably be thrown out through the filtering that Mutect2 performs. Can you take a look at the following algorithm and troubleshooting documents and let me know if you still have questions about the output?
https://gatk.broadinstitute.org/hc/en-us/articles/360050722212-FAQ-for-Mutect2
One other consideration is that you are using a very outdated version of GATK and there have been many updates to Mutect2 since this version. I would suggest that you update GATK to the newest version and see if you get the results you are expecting.
Kind regards,
Pamela
-
Dear Pamela,
Thank you for your reply!
I can understand that some reads will be thrown out. However, I can't puzzle out why 37% of reference reads (reference allele=G) were thrown out, but only 4.7% of alternative reads (alternative allele=A) were thrown out.
On the other hand, we used the GATK-4.2.2.0 and the Bamout reported that 19% of reference reads (reference allele=G) were thrown out, but only 2.7% of alternative reads (alternative allele=A) were thrown out.
Therefore, my issue is if there are some reasons will cause reads to be unequally thrown out.
Look forward to your reply. Thank you!
Respectfully yours,
CYWang -
Hi CYWang,
I spoke with one of the developers for Mutect2 on the GATK team and he looked at your results and gave the following response:
My best guess is that some reference reads, after having their ends clipped to fit the active region, no longer have a unique mapping within the padded assembly region. For example, let’s say that starting from one end of the assembly window a reference read is ACGGGAA and that this same sequence appears 200 bases downstream. Then when we realign the read could be assigned to a location that does not exist in the bamout since it’s in the assembly padding. Now suppose this read actually has some leading sequence before the active region and so it’s really [TTCCATAG]ACGGGAA, and that this sequence is unique. Then if we didn’t clip the reads so dramatically the problem would vanish.
So it appears that the issue with the filtering of the reference reads may be due to clipping of the reads during Metect's processing. Could you test this by enlarging the active region with the arguments:
-padding-around-snps 50 -min-assembly-region-size 100
and let me know if the number of thrown-out reads changes?Kind regards,
Pamela
-
Dear Pamela,
Thank you for your reply!
We had tested by the command (-padding-around-snps 50 -min-assembly-region-size 100 or --padding-around-snps 50 --min-assembly-region-size 100), and both Bamout reported that 37.7% of reference reads (reference allele, G=16012) were thrown out, but only 5.7% of alternative reads (alternative allele, A=498) were thrown out.
Look forward to your reply. Thank you!
Respectfully yours,
CYWang -
Hi CYWang,
Thank you for providing this update and for trying my suggestion. I've shared your results with the Mutect2 developers and they are currently looking into other possible explanations for the results you're seeing. I will keep you updated with any findings or suggestions.
Kind regards,
Pamela
-
Dear Pamela,
Thank you for your help and we are looking forward your finding and suggestions!
Respectfully yours,
CYWang -
Hi CYWang,
Your issue was brought up with some other GATK team members today and I have a few suggestions/findings:
It seems that your reads are likely being dropped as part of the final step in likelihood calculation. These reads could be actual artifacts which is why they are being thrown out. This article may help explain this. It is often difficult to troubleshoot why certain reads are dropped, so we are going to be working on a dropped read tracker for Mutect2 in the future.
You could try to use the --enable-dynamic-read-disqualification option in the read filtering after the likelihoods calculation. This can induce some false positives but may help keep some of your reads that are being thrown out.
Could you read through this Mutect2 information and possibly try this option and let me know if it helps? Additionally, could you provide any information on why you think your results are particularly unexpected? There could be something else going on at this site accounting for the thrown-out reads.
Kind regards,
Pamela
Please sign in to leave a comment.
7 comments