Why are some variants not phased in Mutect2
There were some variants not phased in the output of Mutect2, even when variants were close in the genome. Below are two example variants.How did mutect2 decide when to phase variants?
```
chr1 11854457 . G A . germline;normal_artifact;panel_of_normals CONTQ=93;DP=2089;ECNT=2;GERMQ=1;MBQ=0,36;MFRL=0,154;MMQ=60,60;MPOS=27;NALOD=-2334;NLOD=-2201;PON;POPAF=0.035;ROQ=93;SEQQ=93;STRANDQ=93;TLOD=5720.89;PValue=1.0;oddsRatio=nan;PON_RESULT=higher_cutoff;unrescue_type=pvalue GT:AD:AF:DP:F1R2:F2R1:SB:RF 0/1:0,1450:1.0:1450:0,664:0,761:0,0,924,526:0,0,924,526 0/0:0,599:1.0:599:0,320:0,267:0,0,369,230:0,0,369,230
chr1 11854476 . T G . germline;normal_artifact;panel_of_normals CONTQ=93;DP=2385;ECNT=2;GERMQ=1;MBQ=33,20;MFRL=153,148;MMQ=60,60;MPOS=30;NALOD=-790.4;NLOD=-790.4;PON;POPAF=0.57;ROQ=93;SEQQ=93;STRANDQ=93;TLOD=2090.52;PValue=0.6098308640942567;oddsRatio=1.0521314146076568;PON_RESULT=lower_cutoff;unrescue_type=blackSite GT:AD:AF:DP:F1R2:F2R1:SB:RF 0/1:930,762:0.45:1692:415,371:513,388:554,376,450,312:554,376,450,312 0/0:348,300:0.463:648:180,147:166,151:199,149,160,140:199,149,160,140
```
---------------------------------------
REQUIRED for all errors and issues:
a) GATK version used:
4.1.4.1
b) Exact command used:
```
Mutect2 --f1r2-tar-gz /mnt/titan01/Orca/SNV_pair_calling/zhang.xiancang/P15_MRD_pair_calling/cupcake_pair_mrd_0714/P298868/gatk/vcf/Y-221123-692975-
FFPE-487572-MRD9234_E221125-0042-DNA_F221125-0320_L221125-00011-B1_W-20221127-C7_panel15_pro_P221127-1037_Sample20221127-B-T7.f1r2.ext50.0002.tar.gz --tumor-sample Y-221123-692975-FFPE-487572-
MRD9234_E221125-0042-DNA_F221125-0320_L221125-00011-B1_W-20221127-C7_panel15_pro_P221127-1037_Sample20221127-B-T7 --normal-sample Y-221123-692975-BC-719555-MRD6400_E221202-0039-DNA_F221202-061
0_L221202-00021-A1_W-20221204-C3_panel15_pro_P221204-0522_Sample20221204-A-A00168 --panel-of-normals /mnt/titan01/Orca/SNV_pair_calling/database/snv_T7_new/PON/panel15_pro.PON.min2.vcf --genot
ype-pon-sites true --genotype-germline-sites true --germline-resource /home/orca/cupcake/databases/gatk_bundle/2.8/hg19/af-only-gnomad.hg19.vcf.gz --output /mnt/titan01/Orca/SNV_pair_calling/z
hang.xiancang/P15_MRD_pair_calling/cupcake_pair_mrd_0714/P298868/gatk/vcf/Y-221123-692975-FFPE-487572-MRD9234_E221125-0042-DNA_F221125-0320_L221125-00011-B1_W-20221127-C7_panel15_pro_P221127-1
037_Sample20221127-B-T7_unfiltered.ext50.0002.vcf.gz --intervals /mnt/titan01/Orca/SNV_pair_calling/zhang.xiancang/P15_MRD_pair_calling/cupcake_pair_mrd_0714/P298868/gatk/bed/0002-scattered.in
terval_list --interval-padding 50 --input /mnt/titan01/Orca/SNV_pair_calling/zhang.xiancang/P15_MRD_pair_calling/bam_data/Y-221123-692975-FFPE-487572-MRD9234_E221125-0042-DNA_F221125-0320_L221
125-00011-B1_W-20221127-C7_panel15_pro_P221127-1037_Sample20221127-B-T7.sorted.rmdup.realign.bam --input /mnt/titan01/Orca/SNV_pair_calling/zhang.xiancang/P15_MRD_pair_calling/bam_data/Y-22112
3-692975-BC-719555-MRD6400_E221202-0039-DNA_F221202-0610_L221202-00021-A1_W-20221204-C3_panel15_pro_P221204-0522_Sample20221204-A-A00168.sorted.rmdup.realign.bam --reference /home/orca/cupcake
/databases/gatk_bundle/2.8/hg19/ucsc.hg19.noconfig.fasta --tmp-dir /mnt/titan01/Orca/SNV_pair_calling/zhang.xiancang/P15_MRD_pair_calling/cupcake_pair_mrd_0714/P298868/gatk/vcf/temp --f1r2-me
dian-mq 50 --f1r2-min-bq 20 --f1r2-max-depth 200 --af-of-alleles-not-in-resource -1.0 --mitochondria-mode false --tumor-lod-to-emit 3.0 --initial-tumor-lod 2.0 --pcr-snv-qual 40 --pcr-indel-qu
al 40 --max-population-af 0.01 --downsampling-stride 1 --callable-depth 10 --max-suspicious-reads-per-alignment-start 0 --normal-lod 2.2 --ignore-itr-artifacts false --gvcf-lod-band -2.5 --gvc
f-lod-band -2.0 --gvcf-lod-band -1.5 --gvcf-lod-band -1.0 --gvcf-lod-band -0.5 --gvcf-lod-band 0.0 --gvcf-lod-band 0.5 --gvcf-lod-band 1.0 --minimum-allele-fraction 0.0 --independent-mates fal
se --disable-adaptive-pruning false --dont-trim-active-regions false --max-extension 25 --padding-around-indels 150 --padding-around-snps 20 --kmer-size 10 --kmer-size 25 --dont-increase-kmer-
sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --min-dangling-branch-length 4 --recover-all-dangling-branches false --max-num-haplotypes-in-population 128
--min-pruning 2 --adaptive-pruning-initial-error-rate 0.001 --pruning-lod-threshold 2.302585092994046 --max-unpruned-variants 100 --linked-de-bruijn-graph false --debug-assembly false --debug
-graph-transformations false --capture-assembly-failure-bam false --error-correct-reads false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood
-calculation-engine PairHMM --base-quality-score-threshold 18 --pair-hmm-gap-continuation-penalty 10 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-g
lobal-read-mismapping-rate 45 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --min-base-qualit
y-score 10 --smith-waterman JAVA --emit-ref-confidence NONE --max-mnp-distance 1 --force-call-filtered-alleles false --min-assembly-region-size 50 --max-assembly-region-size 300 --assembly-reg
ion-padding 100 --max-reads-per-alignment-start 50 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --force-active false --interval-set-rule UNION --interval-exclusion-p
adding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true -
-create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --clou
d-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET f
alse --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false --max-read-length 2147483647 --min-read-
length 30 --minimum-mapping-quality 20 --disable-tool-default-annotations false --enable-all-annotations false
```
c) Entire program log:
-
Hi xin cui,
GATK doesn't do any sort of Bayesian phasing so it's actually quite strict. In order to phase two variants they need to occur on either 100% or 0% of the reads that overlap the two positions. If there's an error that creates a mismatch at either position, then the variants will fail to be phased. A long time ago someone did a sensitivity analysis and it turned out that (compared with GATK3 ReadBackedPhasing) we only had 90% sensitivity for variants that were at adjacent positions. As I said, very strict. The updated assembly in GATK 4.2 or so makes some improvements, but for deep sequencing there will still be errors that prevent phasing. If you're really concerned about phasing then you will probably want to run an additional tool using the read-level information. You can find the official Docker for the old GATK3 (which should have the ReadBackedPhasing tool) here: https://hub.docker.com/r/broadinstitute/gatk3/tags
-Laura
Please sign in to leave a comment.
1 comment