(MultiSample Mode) Somatic Variants PASSed with zero Normal Coverage
Hello, I'm attempting to use M2 and FilterMutectCalls in Multisample mode with multiple tumor samples derived from the same individual as well as a single matched normal, germline resource, and PON for somatic variant discovery. I've noticed that M2 will emit variants with 0,0 depth in the normal, and FilterMutectCalls will Pass these variants (see below).
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT BOB23655BC-TPS-A.bqsr FC15883011-TPS-A.bqsr FC15883012-TPS-A.bqsr FC15883014-TPS-A.bqsr FC15884192-TPS-A.bqsr FC16207777-TPS-A.bqsr FC16207778-TPS-A.bqsr FC16207779-TPS-A.bqsr
1 12921332 . T C . PASS AS_FilterStatus=SITE;AS_SB_TABLE=1432,1531|165,94;DP=3258;ECNT=1;GERMQ=93;MBQ=49,51;MFRL=152,140;MMQ=60,60;MPOS=33;NALOD=0.00;NLOD=0.00;POPAF=0.461;TLOD=810.55 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:0,0:0.500:0:0,0:0,0:0,0,0,0 0/1:620,72:0.122:692:620,72:0,0:296,324,48,24 0/1:725,59:0.084:784:725,59:0,0:358,367,39,20 0/1:0,0:0.500:0:0,0:0,0:0,0,0,0 0/1:894,80:0.093:974:894,80:0,0:432,462,50,30 0/1:724,48:0.065:772:724,48:0,0:346,378,28,20 0/1:0,0:0.500:0:0,0:0,0:0,0,0,0 0/1:0,0:0.500:0:0,0:0,0:0,0,0,0
Is this the intended behavior for the somatic pipeline? If so, is there a way that I can filter these calls out in the GATK pipeline, as to only end up with variants checked against real depth in the normal? Thanks.
Edit: I recently ran these samples again in multisample mode, with an additional normal BAM. This time, evidence of the supposed variant is found in the second normal, but still, the variant is PASSed.
VCF from the second run with both normals (BOB23655BC-TPS-A.bqsr, BOB23655BC-WES-A)
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT BOB23655BC-TPS-A.bqsr BOB23655BC-WES-A FC15883011-TPS-A.bqsr FC15883012-TPS-A.bqsr FC15883014-TPS-A.bqsr FC15884192-TPS-A.bqsr FC16207777-TPS-A.bqsr FC16207778-TPS-A.bqsr FC16207779-TPS-A.bqsr
1 12921332 . T C . PASS AS_FilterStatus=SITE;AS_SB_TABLE=1611,1640|211,128;DP=3637;ECNT=1;GERMQ=93;MBQ=49,50;MFRL=152,141;MMQ=60,60;MPOS=30;NALOD=-4.363e+01;NLOD=-2.648e+01;POPAF=0.461;TLOD=1013.17 GT:AD:AF:DP:F1R2:F2R1:SB 0/0:0,0:0.500:0:0,0:0,0:0,0,0,0 0/0:136,24:0.152:160:81,10:52,14:84,52,15,9 0/1:651,92:0.130:743:651,92:0,0:313,338,58,34 0/1:759,68:0.088:827:759,68:0,0:374,385,44,24 0/1:0,0:0.500:0:0,0:0,0:0,0,0,0 0/1:934,96:0.102:1030:934,96:0,0:456,478,58,38 0/1:771,59:0.075:830:771,59:0,0:384,387,36,23 0/1:0,0:0.500:0:0,0:0,0:0,0,0,0 0/1:0,0:0.500:0:0,0:0,0:0,0,0,0
Using GATK version 4.1.7.0
##GATKCommandLine=<ID=FilterMutectCalls,CommandLine="FilterMutectCalls \
--output merged-somatic-filt.vcf.gz \
--stats /cromwell_root/fc-secure-edf7cb5f-4cbe-4a4c-ab3e-f7cd8a9ff731/1d3d3809-a6dd-4ec1-895a-4e70320ca152/multisample_variant_calling/c1781be3-1470-4e9d-8430-58afde229ff0/call-merge_mutect2results/mergedM2_call.stats \
--filtering-stats merged-somatic-filtering.stats \
--variant /cromwell_root/fc-secure-edf7cb5f-4cbe-4a4c-ab3e-f7cd8a9ff731/1d3d3809-a6dd-4ec1-895a-4e70320ca152/multisample_variant_calling/c1781be3-1470-4e9d-8430-58afde229ff0/call-merge_mutect2results/merged.somatic.vcf.gz \
--reference /cromwell_root/gcp-public-data--broad-references/hg19/v0/Homo_sapiens_assembly19.fasta \
--threshold-strategy OPTIMAL_F_SCORE \
--f-score-beta 1.0 \
--false-discovery-rate 0.05 \
--initial-threshold 0.1 \
--mitochondria-mode false \
--max-events-in-region 2 \
--max-alt-allele-count 1 \
--unique-alt-read-count 0 \
--min-median-mapping-quality 30 \
--min-median-base-quality 20 \
--max-median-fragment-length-difference 10000 \
--min-median-read-position 1 \
--max-n-ratio Infinity \
--min-reads-per-strand 0 \
--min-allele-fraction 0.0 \
--contamination-estimate 0.0 \
--log-snv-prior -13.815510557964275 \
--log-indel-prior -16.11809565095832 --log-artifact-prior -2.302585092994046 \
--normal-p-value-threshold 0.001 --min-slippage-length 8 --pcr-slippage-rate 0.1 \
--distance-on-haplotype 100 --long-indel-length 5 \
--interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 \
--interval-merging-rule ALL --read-validation-stringency SILENT \
--seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false \
--create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false \
--lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 \
--cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false \
--verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays \
--disable-tool-default-read-filters false",Version="4.1.7.0",Date="June 2, 2020 at 9:51:13 PM UTC">
-
Zachary Weber this is a good question. Mutect2 and FilterMutectCalls don't do anything special when there are no normal reads. It's all the same math, just with any sums over normal reads coming out to zero. This means that there is less certainty in both directions — we can't rule that that it's germline, but we also can't rule out that it's not germline — and FilterMutectCalls has to try its best under non-ideal circumstances. In your cases the allele fractions are high enough not to look like artifacts but low enough not to look like germline hets. This is qualitatively why the variants passed.
-
Thanks for the response, David. This is a peculiar case, and Ideally, we would have good coverage in the normal over all the sites of interest. I think the best course of action would be to annotate, or hard filter these variants in my pipeline.
Please sign in to leave a comment.
2 comments