Mutect2 forced calling via --alleles does not always generate an output
I am attempting to force Mutect2 to call a particular variant of interest, given in variant.vcf.gz, using the following command:
gatk Mutect2 \
-R "hg38.fa" \
--alleles "variant.vcf.gz" \
-L "variant.vcf.gz" \
-I "input.bam" \
-O "output.vcf"
If I use a VCF generated by Mutect2 as variant.vcf.gz (i.e. a VCF that contains "genuine" output), this works fine. Here is such an input VCF file that works well:
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##bcftoolsVersion=1.15.1+htslib-1.18
##bcftoolsCommand=mpileup -Ou -f /home/mcarta/databases/hg38.fa --annotate FORMAT/AD -d 1000000 /add/2024-03-03_count_path_variants/CJD30_E200K_reads_of_interest.bam
##reference=file:///home/mcarta/databases/hg38.fa
##contig=<ID=chr2,length=242193529>
##contig=<ID=chr20,length=64444167>
##contig=<ID=chr4,length=190214555>
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of raw reads supporting an indel">
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of raw reads supporting an indel">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
##INFO=<ID=RPBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Read Position Bias (closer to 0 is better)">
##INFO=<ID=MQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality Bias (closer to 0 is better)">
##INFO=<ID=BQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Base Quality Bias (closer to 0 is better)">
##INFO=<ID=MQSBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality vs Strand Bias (closer to 0 is better)">
##INFO=<ID=SCBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Soft-Clip Length Bias (closer to 0 is better)">
##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric.">
##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths (high-quality bases)">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
##bcftools_callVersion=1.15.1+htslib-1.18
##bcftools_callCommand=call -mv -Oz -o variants.vcf.gz; Date=Sun Mar 3 15:02:24 2024
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 312092_13-CJD30_G01_S16_R1_001.fastq.gz
chr20 4699818 . G A 222.391 . DP=1064;VDB=0.670108;SGB=-0.693147;RPBZ=-0.800274;MQBZ=-1.6155;MQSBZ=-4.79102;BQBZ=9.03611;SCBZ=0.192787;FS=0;MQ0F=0;AC=1;AN=2;DP4=214,222,177,203;MQ=59 GT:PL:AD 0/1:255,0,190:436,380
Here is the output VCF (this output is what I want to receive):
##FORMAT=<ID=F1R2,Number=R,Type=Integer,Description="Count of reads in F1R2 pair orientation supporting each allele">
##FORMAT=<ID=F2R1,Number=R,Type=Integer,Description="Count of reads in F2R1 pair orientation supporting each allele">
##FORMAT=<ID=FAD,Number=R,Type=Integer,Description="Count of fragments supporting each allele.">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not intended to describe called alleles">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically the position of the first variant in the set)">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##GATKCommandLine=<ID=Mutect2,CommandLine="Mutect2 --alleles /add/variant_VCF/E200K.vcf.gz --output /add/results/2024-03-04_force_mutect/2024-03-10_E200K_VCFtest//E200K_test_outputVCF.vcf --intervals /add/variant_VCF/E200K.vcf.gz --input /add/seq_data/2021-10-19_first_CJD_seq/20211027.0-o26424_1_1-CJD1_fr.picard.markedDup.recal.bam --reference /home/mcarta/databases/hg38.fa --f1r2-median-mq 50 --f1r2-min-bq 20 --f1r2-max-depth 200 --flow-likelihood-parallel-threads 0 --flow-likelihood-optimized-comp false --trim-to-haplotype true --exact-matching false --flow-use-t0-tag false --flow-probability-threshold 0.003 --flow-remove-non-single-base-pair-indels false --flow-remove-one-zero-probs false --flow-quantization-bins 121 --flow-fill-empty-bins-value 0.001 --flow-symmetric-indel-probs false --flow-report-insertion-or-deletion false --flow-disallow-probs-larger-than-call false --flow-lump-probs false --flow-retain-max-n-probs-base-format false --flow-probability-scaling-factor 10 --flow-order-cycle-length 4 --keep-boundary-flows false --genotype-pon-sites false --genotype-germline-sites false --af-of-alleles-not-in-resource -1.0 --mitochondria-mode false --mutect3-training-mode false --mutect3-ref-downsample 10 --mutect3-alt-downsample 20 --mutect3-non-artifact-ratio 20 --tumor-lod-to-emit 3.0 --initial-tumor-lod 2.0 --pcr-snv-qual 40 --pcr-indel-qual 40 --base-qual-correction-factor 5 --max-population-af 0.01 --downsampling-stride 1 --callable-depth 10 --max-suspicious-reads-per-alignment-start 0 --normal-lod 2.2 --ignore-itr-artifacts false --gvcf-lod-band -2.5 --gvcf-lod-band -2.0 --gvcf-lod-band -1.5 --gvcf-lod-band -1.0 --gvcf-lod-band -0.5 --gvcf-lod-band 0.0 --gvcf-lod-band 0.5 --gvcf-lod-band 1.0 --minimum-allele-fraction 0.0 --independent-mates false --flow-mode NONE --disable-adaptive-pruning false --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --min-dangling-branch-length 4 --recover-all-dangling-branches false --max-num-haplotypes-in-population 128 --min-pruning 2 --adaptive-pruning-initial-error-rate 0.001 --pruning-lod-threshold 2.302585092994046 --pruning-seeding-lod-threshold 9.210340371976184 --max-unpruned-variants 100 --linked-de-bruijn-graph false --disable-artificial-haplotype-recovery false --enable-legacy-graph-cycle-detection false --debug-assembly false --debug-graph-transformations false --capture-assembly-failure-bam false --num-matching-bases-in-dangling-end-to-recover -1 --error-correction-log-odds -Infinity --error-correct-reads false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --base-quality-score-threshold 18 --dragstr-het-hom-ratio 2 --dont-use-dragstr-pair-hmm-scores false --pair-hmm-gap-continuation-penalty 10 --expected-mismatch-rate-for-read-disqualification 0.02 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --disable-symmetric-hmm-normalizing false --disable-cap-base-qualities-to-map-quality false --enable-dynamic-read-disqualification-for-genotyping false --dynamic-read-disqualification-threshold 1.0 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --flow-hmm-engine-min-indel-adjust 6 --flow-hmm-engine-flat-insertion-penatly 45 --flow-hmm-engine-flat-deletion-penatly 45 --pileup-detection false --use-pdhmm false --use-pdhmm-overlap-optimization false --make-determined-haps-from-pd-code false --print-pileupcalling-status false --fallback-gga-if-pdhmm-fails true --pileup-detection-enable-indel-pileup-calling false --pileup-detection-active-region-phred-threshold 0.0 --num-artificial-haplotypes-to-add-per-allele 5 --artifical-haplotype-filtering-kmer-size 10 --pileup-detection-snp-alt-threshold 0.1 --pileup-detection-indel-alt-threshold 0.1 --pileup-detection-absolute-alt-depth 0.0 --pileup-detection-snp-adjacent-to-assembled-indel-range 5 --pileup-detection-snp-basequality-filter 12 --pileup-detection-bad-read-tolerance 0.0 --pileup-detection-proper-pair-read-badness true --pileup-detection-edit-distance-read-badness-threshold 0.08 --pileup-detection-chimeric-read-badness true --pileup-detection-template-mean-badness-threshold 0.0 --pileup-detection-template-std-badness-threshold 0.0 --pileup-detection-filter-assembly-alt-bad-read-tolerance 0.0 --pileup-detection-edit-distance-read-badness-for-assembly-filtering-threshold 0.12 --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --override-fragment-softclip-check false --min-base-quality-score 10 --smith-waterman FASTEST_AVAILABLE --emit-ref-confidence NONE --max-mnp-distance 1 --force-call-filtered-alleles false --reference-model-deletion-quality 30 --soft-clip-low-quality-ends false --allele-informative-reads-overlap-margin 2 --smith-waterman-dangling-end-match-value 25 --smith-waterman-dangling-end-mismatch-penalty -50 --smith-waterman-dangling-end-gap-open-penalty -110 --smith-waterman-dangling-end-gap-extend-penalty -6 --smith-waterman-haplotype-to-reference-match-value 200 --smith-waterman-haplotype-to-reference-mismatch-penalty -150 --smith-waterman-haplotype-to-reference-gap-open-penalty -260 --smith-waterman-haplotype-to-reference-gap-extend-penalty -11 --smith-waterman-read-to-haplotype-match-value 10 --smith-waterman-read-to-haplotype-mismatch-penalty -15 --smith-waterman-read-to-haplotype-gap-open-penalty -30 --smith-waterman-read-to-haplotype-gap-extend-penalty -5 --flow-assembly-collapse-hmer-size 0 --flow-assembly-collapse-partial-mode false --flow-filter-alleles false --flow-filter-alleles-qual-threshold 30.0 --flow-filter-alleles-sor-threshold 3.0 --flow-filter-lone-alleles false --flow-filter-alleles-debug-graphs false --min-assembly-region-size 50 --max-assembly-region-size 300 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --force-active false --assembly-region-padding 100 --padding-around-indels 75 --padding-around-snps 20 --padding-around-strs 75 --max-extension-into-assembly-region-padding-legacy 25 --max-reads-per-alignment-start 50 --enable-legacy-assembly-region-trimming false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --max-variants-per-shard 0 --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false --max-read-length 2147483647 --min-read-length 30 --minimum-mapping-quality 20 --disable-tool-default-annotations false --enable-all-annotations false",Version="4.5.0.0",Date="March 10, 2024 at 3:14:27 PM UTC">
##INFO=<ID=AS_SB_TABLE,Number=1,Type=String,Description="Allele-specific forward/reverse read counts for strand bias tests. Includes the reference and alleles separated by |.">
##INFO=<ID=AS_UNIQ_ALT_READ_COUNT,Number=A,Type=Integer,Description="Number of reads with unique start and mate end positions for each alt at a variant site">
##INFO=<ID=CONTQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to contamination">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=ECNT,Number=1,Type=Integer,Description="Number of events in this haplotype">
##INFO=<ID=GERMQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles are not germline variants">
##INFO=<ID=MBQ,Number=R,Type=Integer,Description="median base quality by allele">
##INFO=<ID=MFRL,Number=R,Type=Integer,Description="median fragment length by allele">
##INFO=<ID=MMQ,Number=R,Type=Integer,Description="median mapping quality by allele">
##INFO=<ID=MPOS,Number=A,Type=Integer,Description="median distance from end of read">
##INFO=<ID=NALOD,Number=A,Type=Float,Description="Negative log 10 odds of artifact in normal with same allele fraction as tumor">
##INFO=<ID=NCount,Number=1,Type=Integer,Description="Count of N bases in the pileup">
##INFO=<ID=NLOD,Number=A,Type=Float,Description="Normal log 10 likelihood ratio of diploid het or hom alt genotypes">
##INFO=<ID=OCM,Number=1,Type=Integer,Description="Number of alt reads whose original alignment doesn't match the current contig.">
##INFO=<ID=PON,Number=0,Type=Flag,Description="site found in panel of normals">
##INFO=<ID=POPAF,Number=A,Type=Float,Description="negative log 10 population allele frequencies of alt alleles">
##INFO=<ID=ROQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to read orientation artifact">
##INFO=<ID=RPA,Number=R,Type=Integer,Description="Number of times tandem repeat unit is repeated, for each allele (including reference)">
##INFO=<ID=RU,Number=1,Type=String,Description="Tandem repeat unit (bases)">
##INFO=<ID=SEQQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles are not sequencing errors">
##INFO=<ID=STR,Number=0,Type=Flag,Description="Variant is a short tandem repeat">
##INFO=<ID=STRANDQ,Number=1,Type=Integer,Description="Phred-scaled quality of strand bias artifact">
##INFO=<ID=STRQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles in STRs are not polymerase slippage errors">
##INFO=<ID=TLOD,Number=A,Type=Float,Description="Log 10 likelihood ratio score of variant existing versus not existing">
##MutectVersion=2.2
##contig=<ID=chr2,length=242193529>
##contig=<ID=chr20,length=64444167>
##contig=<ID=chr4,length=190214555>
##filtering_status=Warning: unfiltered Mutect 2 calls. Please run FilterMutectCalls to remove false positives.
##source=Mutect2
##tumor_sample=20211027.0-o26424_1_1-CJD1_fr
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
chr20 4699818 . G A . . AS_SB_TABLE=1669,1637|0,0;DP=3485;ECNT=1;MBQ=18,0;MFRL=255,0;MMQ=60,60;MPOS=50;POPAF=7.30;TLOD=-3.510e+00 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:3306,0:3.584e-04:3306:498,0:477,0:2847,0:1669,1637,0,0
However, I haven't got VCF files for all my variants of interest. So I created various "manual" ones, such as the following (here, I use the same variant as above, chr20:4699818, to have a direct comparison):
##fileformat=VCFv4.3
##fileDate=20240310
##source=Ensembl
##reference=GRCh38
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample.fastq.gz
chr20 4699818 . G A . . . .
If I use such a VCF as input in the -L and --alleles options to force call, I get an "empty" VCF as an output:
##fileformat=VCFv4.2
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele fractions of alternate alleles in the tumor">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=F1R2,Number=R,Type=Integer,Description="Count of reads in F1R2 pair orientation supporting each allele">
##FORMAT=<ID=F2R1,Number=R,Type=Integer,Description="Count of reads in F2R1 pair orientation supporting each allele">
##FORMAT=<ID=FAD,Number=R,Type=Integer,Description="Count of fragments supporting each allele.">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not intended to describe called alleles">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically the position of the first variant in the set)">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##GATKCommandLine=<ID=Mutect2,CommandLine="Mutect2 --alleles /add/variant_VCF/E200K_manual_test.vcf.gz --output /add/results/2024-03-04_force_mutect/2024-03-10_E200K_VCFtest//E200K_test_manualVCF2.vcf --intervals /add/variant_VCF/E200K_manual_test.vcf.gz --input /add/seq_data/2021-10-19_first_CJD_seq/20211027.0-o26424_1_1-CJD1_fr.picard.markedDup.recal.bam --reference /home/mcarta/databases/hg38.fa --f1r2-median-mq 50 --f1r2-min-bq 20 --f1r2-max-depth 200 --flow-likelihood-parallel-threads 0 --flow-likelihood-optimized-comp false --trim-to-haplotype true --exact-matching false --flow-use-t0-tag false --flow-probability-threshold 0.003 --flow-remove-non-single-base-pair-indels false --flow-remove-one-zero-probs false --flow-quantization-bins 121 --flow-fill-empty-bins-value 0.001 --flow-symmetric-indel-probs false --flow-report-insertion-or-deletion false --flow-disallow-probs-larger-than-call false --flow-lump-probs false --flow-retain-max-n-probs-base-format false --flow-probability-scaling-factor 10 --flow-order-cycle-length 4 --keep-boundary-flows false --genotype-pon-sites false --genotype-germline-sites false --af-of-alleles-not-in-resource -1.0 --mitochondria-mode false --mutect3-training-mode false --mutect3-ref-downsample 10 --mutect3-alt-downsample 20 --mutect3-non-artifact-ratio 20 --tumor-lod-to-emit 3.0 --initial-tumor-lod 2.0 --pcr-snv-qual 40 --pcr-indel-qual 40 --base-qual-correction-factor 5 --max-population-af 0.01 --downsampling-stride 1 --callable-depth 10 --max-suspicious-reads-per-alignment-start 0 --normal-lod 2.2 --ignore-itr-artifacts false --gvcf-lod-band -2.5 --gvcf-lod-band -2.0 --gvcf-lod-band -1.5 --gvcf-lod-band -1.0 --gvcf-lod-band -0.5 --gvcf-lod-band 0.0 --gvcf-lod-band 0.5 --gvcf-lod-band 1.0 --minimum-allele-fraction 0.0 --independent-mates false --flow-mode NONE --disable-adaptive-pruning false --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --min-dangling-branch-length 4 --recover-all-dangling-branches false --max-num-haplotypes-in-population 128 --min-pruning 2 --adaptive-pruning-initial-error-rate 0.001 --pruning-lod-threshold 2.302585092994046 --pruning-seeding-lod-threshold 9.210340371976184 --max-unpruned-variants 100 --linked-de-bruijn-graph false --disable-artificial-haplotype-recovery false --enable-legacy-graph-cycle-detection false --debug-assembly false --debug-graph-transformations false --capture-assembly-failure-bam false --num-matching-bases-in-dangling-end-to-recover -1 --error-correction-log-odds -Infinity --error-correct-reads false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --base-quality-score-threshold 18 --dragstr-het-hom-ratio 2 --dont-use-dragstr-pair-hmm-scores false --pair-hmm-gap-continuation-penalty 10 --expected-mismatch-rate-for-read-disqualification 0.02 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --disable-symmetric-hmm-normalizing false --disable-cap-base-qualities-to-map-quality false --enable-dynamic-read-disqualification-for-genotyping false --dynamic-read-disqualification-threshold 1.0 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --flow-hmm-engine-min-indel-adjust 6 --flow-hmm-engine-flat-insertion-penatly 45 --flow-hmm-engine-flat-deletion-penatly 45 --pileup-detection false --use-pdhmm false --use-pdhmm-overlap-optimization false --make-determined-haps-from-pd-code false --print-pileupcalling-status false --fallback-gga-if-pdhmm-fails true --pileup-detection-enable-indel-pileup-calling false --pileup-detection-active-region-phred-threshold 0.0 --num-artificial-haplotypes-to-add-per-allele 5 --artifical-haplotype-filtering-kmer-size 10 --pileup-detection-snp-alt-threshold 0.1 --pileup-detection-indel-alt-threshold 0.1 --pileup-detection-absolute-alt-depth 0.0 --pileup-detection-snp-adjacent-to-assembled-indel-range 5 --pileup-detection-snp-basequality-filter 12 --pileup-detection-bad-read-tolerance 0.0 --pileup-detection-proper-pair-read-badness true --pileup-detection-edit-distance-read-badness-threshold 0.08 --pileup-detection-chimeric-read-badness true --pileup-detection-template-mean-badness-threshold 0.0 --pileup-detection-template-std-badness-threshold 0.0 --pileup-detection-filter-assembly-alt-bad-read-tolerance 0.0 --pileup-detection-edit-distance-read-badness-for-assembly-filtering-threshold 0.12 --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --override-fragment-softclip-check false --min-base-quality-score 10 --smith-waterman FASTEST_AVAILABLE --emit-ref-confidence NONE --max-mnp-distance 1 --force-call-filtered-alleles false --reference-model-deletion-quality 30 --soft-clip-low-quality-ends false --allele-informative-reads-overlap-margin 2 --smith-waterman-dangling-end-match-value 25 --smith-waterman-dangling-end-mismatch-penalty -50 --smith-waterman-dangling-end-gap-open-penalty -110 --smith-waterman-dangling-end-gap-extend-penalty -6 --smith-waterman-haplotype-to-reference-match-value 200 --smith-waterman-haplotype-to-reference-mismatch-penalty -150 --smith-waterman-haplotype-to-reference-gap-open-penalty -260 --smith-waterman-haplotype-to-reference-gap-extend-penalty -11 --smith-waterman-read-to-haplotype-match-value 10 --smith-waterman-read-to-haplotype-mismatch-penalty -15 --smith-waterman-read-to-haplotype-gap-open-penalty -30 --smith-waterman-read-to-haplotype-gap-extend-penalty -5 --flow-assembly-collapse-hmer-size 0 --flow-assembly-collapse-partial-mode false --flow-filter-alleles false --flow-filter-alleles-qual-threshold 30.0 --flow-filter-alleles-sor-threshold 3.0 --flow-filter-lone-alleles false --flow-filter-alleles-debug-graphs false --min-assembly-region-size 50 --max-assembly-region-size 300 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --force-active false --assembly-region-padding 100 --padding-around-indels 75 --padding-around-snps 20 --padding-around-strs 75 --max-extension-into-assembly-region-padding-legacy 25 --max-reads-per-alignment-start 50 --enable-legacy-assembly-region-trimming false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --max-variants-per-shard 0 --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false --max-read-length 2147483647 --min-read-length 30 --minimum-mapping-quality 20 --disable-tool-default-annotations false --enable-all-annotations false",Version="4.5.0.0",Date="March 10, 2024 at 3:10:52 PM UTC">
##INFO=<ID=AS_SB_TABLE,Number=1,Type=String,Description="Allele-specific forward/reverse read counts for strand bias tests. Includes the reference and alleles separated by |.">
##INFO=<ID=AS_UNIQ_ALT_READ_COUNT,Number=A,Type=Integer,Description="Number of reads with unique start and mate end positions for each alt at a variant site">
##INFO=<ID=CONTQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to contamination">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=ECNT,Number=1,Type=Integer,Description="Number of events in this haplotype">
##INFO=<ID=GERMQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles are not germline variants">
##INFO=<ID=MBQ,Number=R,Type=Integer,Description="median base quality by allele">
##INFO=<ID=MFRL,Number=R,Type=Integer,Description="median fragment length by allele">
##INFO=<ID=MMQ,Number=R,Type=Integer,Description="median mapping quality by allele">
##INFO=<ID=MPOS,Number=A,Type=Integer,Description="median distance from end of read">
##INFO=<ID=NALOD,Number=A,Type=Float,Description="Negative log 10 odds of artifact in normal with same allele fraction as tumor">
##INFO=<ID=NCount,Number=1,Type=Integer,Description="Count of N bases in the pileup">
##INFO=<ID=NLOD,Number=A,Type=Float,Description="Normal log 10 likelihood ratio of diploid het or hom alt genotypes">
##INFO=<ID=OCM,Number=1,Type=Integer,Description="Number of alt reads whose original alignment doesn't match the current contig.">
##INFO=<ID=PON,Number=0,Type=Flag,Description="site found in panel of normals">
##INFO=<ID=POPAF,Number=A,Type=Float,Description="negative log 10 population allele frequencies of alt alleles">
##INFO=<ID=ROQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to read orientation artifact">
##INFO=<ID=RPA,Number=R,Type=Integer,Description="Number of times tandem repeat unit is repeated, for each allele (including reference)">
##INFO=<ID=RU,Number=1,Type=String,Description="Tandem repeat unit (bases)">
##INFO=<ID=SEQQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles are not sequencing errors">
##INFO=<ID=STR,Number=0,Type=Flag,Description="Variant is a short tandem repeat">
##INFO=<ID=STRANDQ,Number=1,Type=Integer,Description="Phred-scaled quality of strand bias artifact">
##INFO=<ID=STRQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles in STRs are not polymerase slippage errors">
##INFO=<ID=TLOD,Number=A,Type=Float,Description="Log 10 likelihood ratio score of variant existing versus not existing">
##MutectVersion=2.2
##contig=<ID=chr2,length=242193529>
##contig=<ID=chr20,length=64444167>
##contig=<ID=chr4,length=190214555>
##filtering_status=Warning: unfiltered Mutect 2 calls. Please run FilterMutectCalls to remove false positives.
##source=Mutect2
##tumor_sample=20211027.0-o26424_1_1-CJD1_fr
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
Now, I'm wondering whether the manual VCF I created contains some kind of formatting error, or whether this might be a bug with the force calling. Essentially, I'd like to know which features the input VCF must contain so that the force calling works. Thank you!
Here is the entire log for a Mutect2 operation that generates an "empty" output VCF:
user@ubuntu22:/add/variant_VCF$ bash "/add/code/mutect_force_variant/E200K_test.sh"
Using GATK jar /home/user/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/mcarta/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar Mutect2 -R /home/mcarta/databases/hg38.fa --alleles /add/variant_VCF/E200K_manual_test2.vcf.gz -L /add/variant_VCF/E200K_manual_test2.vcf.gz -I /add/seq_data/2021-10-19_first_CJD_seq/20211027.0-o26424_1_1-CJD1_fr.picard.markedDup.recal.bam -O /add/results/2024-03-04_force_mutect/2024-03-10_E200K_VCFtest//E200K_test_outputVCF.vcf
15:22:00.164 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/mcarta/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
15:22:00.342 INFO Mutect2 - ------------------------------------------------------------
15:22:00.345 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.5.0.0
15:22:00.345 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
15:22:00.345 INFO Mutect2 - Executing as mcarta@ubuntu22 on Linux v5.15.0-97-generic amd64
15:22:00.346 INFO Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v17.0.10+7-Ubuntu-122.04.1
15:22:00.346 INFO Mutect2 - Start Date/Time: March 10, 2024 at 3:22:00 PM UTC
15:22:00.346 INFO Mutect2 - ------------------------------------------------------------
15:22:00.346 INFO Mutect2 - ------------------------------------------------------------
15:22:00.348 INFO Mutect2 - HTSJDK Version: 4.1.0
15:22:00.348 INFO Mutect2 - Picard Version: 3.1.1
15:22:00.348 INFO Mutect2 - Built for Spark Version: 3.5.0
15:22:00.349 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:22:00.350 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:22:00.350 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:22:00.352 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:22:00.352 INFO Mutect2 - Deflater: IntelDeflater
15:22:00.352 INFO Mutect2 - Inflater: IntelInflater
15:22:00.353 INFO Mutect2 - GCS max retries/reopens: 20
15:22:00.353 INFO Mutect2 - Requester pays: disabled
15:22:00.353 INFO Mutect2 - Initializing engine
15:22:00.530 INFO FeatureManager - Using codec VCFCodec to read file file:///add/variant_VCF/E200K_manual_test2.vcf.gz
15:22:00.555 INFO FeatureManager - Using codec VCFCodec to read file file:///add/variant_VCF/E200K_manual_test2.vcf.gz
15:22:00.571 INFO IntervalArgumentCollection - Processing 1 bp from intervals
15:22:00.580 INFO Mutect2 - Done initializing engine
15:22:00.603 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/mcarta/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
15:22:00.605 INFO NativeLibraryLoader - Loading libgkl_smithwaterman.so from jar:file:/home/mcarta/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar!/com/intel/gkl/native/libgkl_smithwaterman.so
15:22:00.606 INFO SmithWatermanAligner - Using AVX accelerated SmithWaterman implementation
15:22:00.619 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/home/mcarta/gatk-4.5.0.0/gatk-package-4.5.0.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
15:22:00.633 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
15:22:00.634 INFO IntelPairHmm - Available threads: 2
15:22:00.634 INFO IntelPairHmm - Requested threads: 4
15:22:00.635 WARN IntelPairHmm - Using 2 available threads, but 4 were requested
15:22:00.635 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
15:22:00.680 INFO ProgressMeter - Starting traversal
15:22:00.681 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
15:22:02.654 INFO Mutect2 - 253 read(s) filtered by: MappingQualityReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappingQualityNotZeroReadFilter
0 read(s) filtered by: MappedReadFilter
0 read(s) filtered by: NotSecondaryAlignmentReadFilter
44578 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: NonChimericOriginalAlignmentReadFilter
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
0 read(s) filtered by: ReadLengthReadFilter
0 read(s) filtered by: GoodCigarReadFilter
0 read(s) filtered by: WellformedReadFilter
44831 total reads filtered out of 57160 reads processed
15:22:02.655 INFO ProgressMeter - unmapped 0.0 1 30.4
15:22:02.655 INFO ProgressMeter - Traversal complete. Processed 1 total regions in 0.0 minutes.
15:22:02.661 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.0
15:22:02.661 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.0
15:22:02.661 INFO SmithWatermanAligner - Total compute time in native Smith-Waterman : 0.00 sec
15:22:02.662 INFO Mutect2 - Shutting down engine
[March 10, 2024 at 3:22:02 PM UTC] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=218103808
Tool returned:
SUCCESS
-
Hi M Carta
The site that you are trying to genotype looks like it has a very high allelic fraction therefore it is a potential germline site with a very high GERMQ value. Germline sites by default are filtered and not emitted by Mutect2 so you may need to add the following parameter to your commandline to force call those alleles albeit they will be filtered by FilterMutectCalls command due to high probability of being germline.
--genotype-germline-sites <Boolean> Call all apparent germline site even though they will ultimately be filtered. Default
value: false. Possible values: {true, false}I hope this helps.
-
Thank you Gökalp Çelik. I tried adding the option suggested by you, but I'm afraid that the output VCF is still empty:
gatk Mutect2 \
-R "hg38.fa" \
--alleles "variant.vcf.gz" \
--genotype-germline-sites TRUE \
-L "variant.vcf.gz" \
-I "input.bam" \
-O "output.vcf"
As a non-expert, to me the issue seems to primarily depend on "variant.vcf.gz", given that using a VCF with a certain format (the first one in my post above) allows an output to be generated, even if the variant I'm looking for isn't present at all.In fact, I just tried editing the functional "variant.vcf.gz" and entered random POS values; an output was produced.
In case you're wondering why I'm doing this: I want to see whether there are any reads supporting certain somatic mutations of interest. -
Hi again.
For the sake of reproducing this issue I created the below vcf file for targets.
##fileformat=VCFv4.3
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
chr17 7654101 . A G . . . . .
chr17 7654112 . A G . . . . .The site at the top has a variant in my bam file whereas the site at the bottom is HOMREF. When Mutect2 is run with the following command line
gatk Mutect2 -R /mnt/E/Genomes/hg38/hg38v3.fasta -I sorted.bam -O gtalleles.vcf --alleles targets2.vcf.gz -L targets2.vcf.gz
I get the below result
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
chr17 7654101 . A G . . AS_SB_TABLE=0,0|71,79;DP=155;ECNT=1;MBQ=0,37;MFRL=0,301;MMQ=60,60;MPOS=37;POPAF=7.30;TLOD=526.99 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:0,150:0.992:150:0,64:0,56:0,122:0,0,71,79But when I run the Mutect2 with the additional
-ERC GVCF
I get the following output.
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
chr17 7654101 . A G,<NON_REF> . . AS_SB_TABLE=0,0|71,79|0,0;DP=155;ECNT=1;MBQ=0,37,0;MFRL=0,301,0;MMQ=60,60,60;MPOS=37,50;POPAF=7.30,7.30;TLOD=526.15,-1.794e+00 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1/2:0,150,0:0.984,7.893e-03:150:0,64,0:0,56,0:0,122,0:0,0,71,79
chr17 7654112 . A <NON_REF> . . END=7654112 GT:DP:MIN_DP:TLOD 0/0:161:161:-2.210e+00Looks like Mutect2 is not emitting sites when there is no evidence. Is this what you have been observing?
-
M Carta I'm not sure this is the root of the issue but the lack of sequence dictionary lines eg
##contig=<ID=chr2,length=242193529>
in the header of your "manual" VCFs worries me a bit. How about you try again after pasting in the sequence dictionary lines to the header (this may require re-indexing as well with IndexFeatureFile) and let us know how it goes?
Please sign in to leave a comment.
4 comments