Mutect2 unexpected false negative due to changes in smith-waterman parameters?
AnsweredIf you are seeing an error, please provide(REQUIRED) :
a) GATK version used: 4.2.4.0
b) Exact command used: "Mutect2 --f1r2-tar-gz AMOncoHS-A.f971840a-b541-4c77-baf0-1b990bbbace6.f1r2.tar.gz --tumor-sample AMOncoHS-A --panel-of-normals /home/dnanexus/inputs/input9106865033672720328/PoN.NextSeq.BLOOD_FFPE.SolidTumor.b37.dxCompiler_2.5.0.M2_4.2.2.0.BETA_FRACTION.vcf.gz --genotype-pon-sites true --genotype-germline-sites true --germline-resource /home/dnanexus/inputs/input9106865033672720328/af-only-gnomad.raw.sites.vcf.gz --bam-output AMOncoHS-A.f971840a-b541-4c77-baf0-1b990bbbace6.bamout.bam --dont-use-soft-clipped-bases true --output AMOncoHS-A.f971840a-b541-4c77-baf0-1b990bbbace6.M2_output.vcf --intervals /home/dnanexus/inputs/input9106865033672720328/0027-scattered.interval_list --input /home/dnanexus/inputs/input9106865033672720328/AMOncoHS-A@07192021JH_ST.b37.map.dedup.sample.bam --reference /home/dnanexus/inputs/input9106865033672720328/human_g1k_v37_decoy_GOAL+viral.fasta --f1r2-median-mq 50 --f1r2-min-bq 20 --f1r2-max-depth 200 --af-of-alleles-not-in-resource -1.0 --mitochondria-mode false --training-data-mode false --training-data-mode-ref-downsample 2147483647 --tumor-lod-to-emit 3.0 --initial-tumor-lod 2.0 --pcr-snv-qual 40 --pcr-indel-qual 40 --max-population-af 0.01 --downsampling-stride 1 --callable-depth 10 --max-suspicious-reads-per-alignment-start 0 --normal-lod 2.2 --ignore-itr-artifacts false --gvcf-lod-band -2.5 --gvcf-lod-band -2.0 --gvcf-lod-band -1.5 --gvcf-lod-band -1.0 --gvcf-lod-band -0.5 --gvcf-lod-band 0.0 --gvcf-lod-band 0.5 --gvcf-lod-band 1.0 --minimum-allele-fraction 0.0 --independent-mates false --disable-adaptive-pruning false --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --min-dangling-branch-length 4 --recover-all-dangling-branches false --max-num-haplotypes-in-population 128 --min-pruning 2 --adaptive-pruning-initial-error-rate 0.001 --pruning-lod-threshold 2.302585092994046 --pruning-seeding-lod-threshold 9.210340371976184 --max-unpruned-variants 100 --linked-de-bruijn-graph false --disable-artificial-haplotype-recovery false --enable-legacy-graph-cycle-detection false --debug-assembly false --debug-graph-transformations false --capture-assembly-failure-bam false --num-matching-bases-in-dangling-end-to-recover -1 --error-correction-log-odds -Infinity --error-correct-reads false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --base-quality-score-threshold 18 --dragstr-het-hom-ratio 2 --dont-use-dragstr-pair-hmm-scores false --pair-hmm-gap-continuation-penalty 10 --expected-mismatch-rate-for-read-disqualification 0.02 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --disable-symmetric-hmm-normalizing false --disable-cap-base-qualities-to-map-quality false --enable-dynamic-read-disqualification-for-genotyping false --dynamic-read-disqualification-threshold 1.0 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --bam-writer-type CALLED_HAPLOTYPES --min-base-quality-score 10 --smith-waterman JAVA --emit-ref-confidence NONE --max-mnp-distance 1 --force-call-filtered-alleles false --soft-clip-low-quality-ends false --allele-informative-reads-overlap-margin 2 --smith-waterman-dangling-end-match-value 25 --smith-waterman-dangling-end-mismatch-penalty -50 --smith-waterman-dangling-end-gap-open-penalty -110 --smith-waterman-dangling-end-gap-extend-penalty -6 --smith-waterman-haplotype-to-reference-match-value 200 --smith-waterman-haplotype-to-reference-mismatch-penalty -150 --smith-waterman-haplotype-to-reference-gap-open-penalty -260 --smith-waterman-haplotype-to-reference-gap-extend-penalty -11 --smith-waterman-read-to-haplotype-match-value 10 --smith-waterman-read-to-haplotype-mismatch-penalty -15 --smith-waterman-read-to-haplotype-gap-open-penalty -30 --smith-waterman-read-to-haplotype-gap-extend-penalty -5 --min-assembly-region-size 50 --max-assembly-region-size 300 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --force-active false --assembly-region-padding 100 --padding-around-indels 75 --padding-around-snps 20 --padding-around-strs 75 --max-extension-into-assembly-region-padding-legacy 25 --max-reads-per-alignment-start 50 --enable-legacy-assembly-region-trimming false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --max-variants-per-shard 0 --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false --max-read-length 2147483647 --min-read-length 30 --minimum-mapping-quality 20 --disable-tool-default-annotations false --enable-all-annotations false",Version="4.2.4.0",Date="January 12, 2022 5:08:07 PM GMT"
c) Entire error log:
If not an error, choose a category for your question(REQUIRED):
a)How do I (rescue a false negative variant call)?
b) What does (......) mean?
c) Why do I see (......)?
d) Where do I find (......)?
e) Will (......) be in future releases?
The variant 20:57484420C>T (GNAS:p.Arg201Cys) is a known truth variant in the Acrometrix oncology standard with an expected variant allele frequency of 5-15%.
This variant was correctly detected at 5.9% VAF with an essentially best-practices workflow using GATK 4.2.1.0. Upon running the same input data with GATK 4.2.4.0 the variant was not detected although it is in the bam file with a VAF of 6%.
GATK 4.2.1.0 produced an artificial haplotype of 20:57484400-57484440. GATK 4.2.4.0 did not produce an artificial haplotype.
Two samples run in parallel that had smaller DNA loading (50ng or 25 ng versus 100 ng) produced artificial haplotypes and correctly detected the variant with both GATK 4.2.1.0 and GATK 4.2.4.0.
Current versions of Varscan2 and freebayes with default parameters detected the variant with the 100 ng load.
GATK 4.2.1.0 and GATK 4.2.4.0 were run with the same explicit parameters. However, GATK 4.2.4.0 ran with additional default smith-waterman parameters not apparent in GATK 4.2.1.0. Apparently, these parameters were first exposed in GATK 4.2.3.0.
--smith-waterman-dangling-end-gap-extend-penalty -6
--smith-waterman-dangling-end-gap-open-penalty -110
--smith-waterman-dangling-end-match-value 25
--smith-waterman-dangling-end-mismatch-penalty -50
--smith-waterman-haplotype-to-reference-gap-extend-penalty -11
--smith-waterman-haplotype-to-reference-gap-open-penalty -260
--smith-waterman-haplotype-to-reference-match-value 200
--smith-waterman-haplotype-to-reference-mismatch-penalty -150
--smith-waterman-read-to-haplotype-gap-extend-penalty -5
--smith-waterman-read-to-haplotype-gap-open-penalty -30
--smith-waterman-read-to-haplotype-match-value 10
--smith-waterman-read-to-haplotype-mismatch-penalty -15
There were several other similar false positives with smaller VAFs, but I'm asking about the most egregious case to simplify the issue.
Questions:
- Were implicit, unexposed smith-waterman parameter values in GATK 4.2.1.0 the same as the explicit, default values used by GATK 4.2.4.0?
- If not, what were the values used by GATK 4.2.1.0?
- Is it probable that a difference is the cause of a haplotype not being created by GATK 4.2.4.0?
- What adjustments would be reasonable to make to rescue this variant in GATK 4.2.4.0 and later versions?
IGV image of missing haplotype and raw bam, and expanded region showing a nearby haplotype. I can send bams if that will help.
-
The default values for these for the smith waterman parameters were not changed, only the ability to change them was added. Could get a subset bam that duplicates this scenario and then isolate which specific GATK version you see this variant not appear? It would be helpful for narrowing in on the GATK changes.
-
After further investigation, it appears that the problem may have been introduced between GATH 4.2.1.0 and 4.2.2.0, contrary to our original suspicion that it was introduced in 4.1.3.0. After gathering some more evidence I can upload subset bams.
-
Hi myourshaw,
Thank you for your patience while we look into this difference you are seeing.
- Can you verify that you are running the two GATK versions on the exact same input BAMs?
- Please take a look at this troubleshooting document, When HaplotypeCaller and Mutect2 do not call an expected variant. Could you go through the troubleshooting steps to see what could be causing the discrepancy. I'm also interested in seeing the dot plots right near this site.
If you are still seeing the difference, I will let you know how to upload the files of interest.
Thank you,
Genevieve
Please sign in to leave a comment.
3 comments