Hi, I used Mutect2 (GATK-22.214.171.124) to identify somatic variants in two distinct samples vs a common control, however the read statistics from both VCF output files show different values for the control sample, which is problematic for our subsequent analysis (clonal evolution). Below is an example (before filtering):
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT G4_C G4_P1
chr1 229654549 . T G . . DP=66;ECNT=11;MBQ=20,20;MFRL=174,103;MMQ=60,60;MPOS=13;NALOD=1.40;NLOD=7.22;POPAF=6.00;TLOD=6.06 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|0:28,0:0.038:28:13,0:14,0:0|1:229654515_C_A:229654515:16,12,0,0 0|1:36,2:0.065:38:13,0:21,2:0|1:229654515_C_A:229654515:19,17,1,1
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT G4_C G4_L1
chr1 229654549 . T G . . DP=122;ECNT=2;MBQ=20,20;MFRL=182,151;MMQ=60,60;MPOS=4;NALOD=1.36;NLOD=6.61;POPAF=6.00;TLOD=6.85 GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 0|0:33,0:0.042:33:19,0:12,0:0|1:229654549_T_G:229654549:16,17,0,0 0|1:80,3:0.051:83:54,1:25,2:0|1:229654549_T_G:229654549:35,45,1,2
The commands that I used for both sample pairs are:
gatk Mutect2 -R Homo_sapiens_assembly38.fasta --germline-resource af-only-gnomad.hg38.vcf.gz -I G4_L1.dedup.recal.bam -tumor G4_L1 -I G4_C.dedup.recal.bam -normal G4_C -L G4_ INTERVALS/0000-scattered.interval_list -O G4_L1_Mutect2nf_0000.vcf --f1r2-tar-gz G4_L1_Mutect2_f1r2_0000.tar.gz
gatk Mutect2 -R Homo_sapiens_assembly38.fasta --germline-resource af-only-gnomad.hg38.vcf.gz -I G4_P1.dedup.recal.bam -tumor G4_P1 -I G4_C.dedup.recal.bam -normal G4_C -L G4_ INTERVALS/0000-scattered.interval_list -O G4_P1_Mutect2nf_0000.vcf --f1r2-tar-gz G4_P1_Mutect2_f1r2_0000.tar.gz
This is scattered across 20 cores using different regions passed with –L. The regions are identical in both sample pairs. This is just a test run for 3 samples to evaluate material quality which is why I don’t use PoN. I didn't get any error and the results look reasonable except the control sample statistics which differ for 233 out of 260 common sites.
Does the read filtering in control depend on the tumor sample? I understand that the active regions are different in both pairs which can affect realignment, but why would it lead to such significant differences?
Please sign in to leave a comment.