Variant called in reads from ArtificialHaplotypeRG
AnsweredHi all,
We just found a scenario with Mutect2 (4.1.6.0) in our panel based sequencing data similar to a case reported for HaplotypeCaller (https://gatk.broadinstitute.org/hc/en-us/community/posts/360058309891-how-to-get-rid-of-haplotypecaller-variants-caused-by-weird-artificial-haplotypes).
All the reads supporting the FLT3 Insertion (18 bases): CGAATTTCGACGATCGTT are from Read group ArtificialHaplotypeRG. The following is the detail from one read when viewed in IGV:
Read name = HC40126
Sample = HC
Read group = ArtificialHaplotypeRG
Read length = 298bp
----------------------
Mapping = Primary @ MAPQ 60
Reference span = chr13:28,608,338-28,608,617 (-) = 280bp
Cigar = 205M18I75M
Clipping = None
----------------------
HC = 1165716607
CR = chr13:28608418-28608542
Hidden tags: RGLocation = chr13:28,608,552
Base = A @ QV 33
Any suggestion to get rid of the artifact for Mutect2?
Thanks a lot for the help!
Best,
Ying
-
Hi yingchen69,
Could you give more information about your workflow and the filtering steps after Mutect2? Do you see this after FilterMutectCalls?
Here is a list of details that will help us as we look into this issue.
Best,
Genevieve
-
Hello,
Information that will specifically help this issue is to see the reads that support these insertions in the bamout and input bam for this region. You can find details about the bamout in this troubleshooting document: When HaplotypeCaller and Mutect2 do not call an expected variant.
Genevieve
-
Hi Genevieve Brandt (she/her),
I am also observing reads in the Read group 'ArtificialHaplotypeRG' in the BAM that is generated when running Mutect2 with --bamout. I am using GATK 4.2.0.0. Following Mutect2, I processed the data through LearnReadOrientationModel, GetPileupSummaries, CalculateContamination and FilterMutectCalls.
After excluding variants that did not pass filtering (i.e. do not have the PASS flag assigned by FilterMutectCalls) and applying some additional filters following variant annotation, variants identified in reads in the ArtificialHaplotypeRG group remain. The variants are present in either the ArtificialHaplotypeRG reads alone or both the ArtificialHaplotypeRG reads and real reads. The data is from targeted sequencing with a capture panel.
Please could you advise on how to approach this situation? Is it possible to filter out the ArtificialHaplotypeRG reads in their entirety or alternatively, prevent the variants within these reads being accounted for following FilterMutectCalls? Or should another approach be taken?
Thank you for your time and help. I look forward to your reply.
-
Hi ISmolicz,
Thanks for writing in about this issue so that we can clarify any confusions or fix any problems.
First thing to clarify is that the bamout is a debugging tool and it is not meant for any further analysis. The ArtificialHaplotypeRG reads are created in the bamout in order to view the haplotypes that Mutect2 considered at each location. For each chosen haplotype you should have a new read with the tag ArtificialHaplotypeRG in addition to the other reads that support this haplotype. The bamout will also include reads with the tag ArtificialHaplotypeRG for all the other haplotypes considered.
Is your question regarding why the haplotypes you are seeing are chosen? Or, is it that there are variants in your final VCF that have no reads supporting them, only reads marked with the ArtificialHaplotypeRG? If this is the case, could you provide more information and examples? How many of these variants are you seeing and what does the bamout look like with one of these examples?
Best,
Genevieve
Please sign in to leave a comment.
4 comments