MUTECT2 bamout reports far more reads than original bam - reliable?
Dear all,
I recently encountered that relatively often, the bamout generated by mutect2 reports far more reads for a given position than the original bam files. One example is given below (Example 1):
original bam tumor: total count 28
original bam normal: total count 87
bamout: total count 226
I am aware that I am not the first to ask this question, I also read the mutect2 paper written by the developers, but from the paper it becomes not clear to me why there would be so many more reads in bamout than in the original bams. I would be very happy if someone could explain and also tell me if this is expected behaviour and if the result can be trusted? If additional reads were previously aligned somewhere else, might this mean they are not that trustworthy to really align to the position given in bamout? Is the extent to which there are more reads in bamout than in original bams correlated to something I should keep in mind when evaluating variants?
I give an additional screenshot of example 2, where the number of reads in bamout is also higher, and in this example, the whole region looks quite busy. Both variants passed the FilterMutectCalls. Do you think these variants are rather artifacts or true positives?
Many thanks in advance and best wishes,
Jana
Example 1 (top track: bamout; middle track: normal sample; bottom track: tumor sample)
Example 2: (top track: bamout; middle track: tumor sample; bottom track: normal sample)
a) GATK version used:
GATK-4.3.0.0
b) Exact command used:
$gatk Mutect2 \\
-R $ReferenceGenome/$GenomeSequence \\
-I $rnaroot/$subdirectoryname/${samples[$i]}/${tumorsamples[$i]} \\
-I $rnaroot/$subdirectoryname/${samples[$i]}/${controlsamples[$i]} \\
-normal $normalsamplename \\
--panel-of-normals $pon \\
--germline-resource $germlineresource \\
--bamout ${projectfolder}/results/somatic_mutations_${samples[$i]}.bamout.bam\\
-O ${projectfolder}/results/somatic_mutations_${samples[$i]}.vcf \\
c) Entire program log:
Is very long, if certain part is of interest, I am happy to post it.
-
Jana Marie Schwarz The extra reads might be the artifical reads representing the locally-assembled haplotypes. These will generally all have a uniform length that spans the entire assembly window, without ending in the middle like an actual read. Also, they are marked with a fake read group tag -- I think it's "HP". If you sort by read group in IGV this should stand out.
Please sign in to leave a comment.
1 comment