Mutect2 Calls for non-tumor pooled samples - no variants pass filters
I am using Mutect2 as suggested to variant call non-tumor samples consisting of multiple individuals that were pooled together and sequenced as one. This affects the ploidy value. However, FilterMutectCalls fails consistently, and skipping this step to hard filter using VariantFiltration results in vcfs where no variants pass the filters (namely the suggested QUAL > 30). I am unsure which step is failing, since all run "successfully" and this use case is not as common. From the other help forum postings, it seems like FilterMutectCalls may be unnecessary, and the alpha metric is for comparative tumor data, so I tried skipping this to go directly to VariantFiltration, but nothing passes.
a) GATK version used: 4.4.0
b) Exact command used:
for Mutect2:
/global/scratch/users/laurenhamm/thesis/software/gatk/gatk --java-options "-Xmx64G" Mutect2\
-R /global/scratch/users/laurenhamm/thesis/ref-genomes/v3_mt-cs/Mgut_v3_mt-cs.fa\
-I /global/scratch/users/laurenhamm/thesis/aim2/paleomix/bamOuts/HOL_2015_pool.GEAsamples.bam\
-O /global/scratch/users/laurenhamm/thesis/aim2/vcfs/GATK/indv_vcfs/Mutect2/HOL_2015_pool.M2.g.vcf.gz\
-ploidy 46 \
--min-base-quality-score 25 \
for FilterMutectCalls:
/global/scratch/users/laurenhamm/thesis/software/gatk/gatk --java-options "-Xmx4g" FilterMutectCalls \
-V $f \
-R /global/scratch/users/laurenhamm/thesis/ref-genomes/v3_mt-cs/Mgut_v3_mt-cs.fa \
--max-alt-allele-count 1 \
--min-median-base-quality 20 \
--min-median-mapping-quality 30 \
--create-output-variant-index TRUE \
-O FilterAnno/"$popName".FilterAnno.vcf.gz
for VariantFiltration/SelectVariants:
/global/scratch/users/laurenhamm/thesis/software/gatk/gatk --java-options "-Xmx4g" VariantFiltration \
-V $f \
-filter "QD < 2.0" --filter-name "QD2" \
-filter "QUAL < 30.0" --filter-name "QUAL30" \
-filter "SOR > 3.0" --filter-name "SOR3" \
-filter "FS > 60.0" --filter-name "FS60" \
-filter "MQ < 40" --filter-name "MQ40" \
-filter "MQRankSum < -12.5" --filter-name "MQRankSum-12.5" \
-filter "ReadPosRankSum < 8.0" --filter-name "ReadPosRankSum8" \
-O FilterAnno/"$popName".M2.FilterAnno.vcf.gz
/global/scratch/users/laurenhamm/thesis/software/gatk/gatk --java-options "-Xmx4g" SelectVariants \
-V FilterAnno/"$popName".M2.FilterAnno.vcf.gz \
-select-type SNP \
--exclude-filtered \
-O FilterAnno/"$popName".M2.FilterAnnp.pruned.vcf.gz
c) Entire program log:
for FilterMutectCalls:
Using GATK jar /global/scratch/users/laurenhamm/thesis/software/gatk/build/libs/gatk-package-
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -jar /global/scratch/users/laurenhamm/thesis/software/gatk/build/libs/gatk-package- FilterMutectCalls -V BEL_2014_pool.M2.g.vcf.gz -R /global/scratch/users/laurenhamm/thesis/ref-genomes/v3_mt-cs/Mgut_v3_mt-cs.fa --max-alt-allele-count 1 --min-median-base-quality 20 --min-median-mapping-quality 30 --create-output-variant-index TRUE -O FilterAnno/BEL_2014_pool.FilterAnno.vcf.gz
18:26:03.573 INFO NativeLibraryLoader - Loading from jar:file:/global/scratch/users/laurenhamm/thesis/software/gatk/build/libs/gatk-package-!/com/intel/gkl/native/
18:26:03.815 INFO FilterMutectCalls - ------------------------------------------------------------
18:26:03.820 INFO FilterMutectCalls - The Genome Analysis Toolkit (GATK) v4.4.0.0-43-gd79823f-SNAPSHOT
18:26:03.820 INFO FilterMutectCalls - For support and documentation go to
18:26:03.821 INFO FilterMutectCalls - Executing as laurenhamm@n0001.savio2 on Linux v4.18.0-553.5.1.el8_10.x86_64 amd64
18:26:03.821 INFO FilterMutectCalls - Java runtime: Java HotSpot(TM) 64-Bit Server VM v22.0.1+8-16
18:26:03.821 INFO FilterMutectCalls - Start Date/Time: December 3, 2024, 6:26:03 PM PST
18:26:03.821 INFO FilterMutectCalls - ------------------------------------------------------------
18:26:03.821 INFO FilterMutectCalls - ------------------------------------------------------------
18:26:03.822 INFO FilterMutectCalls - HTSJDK Version: 3.0.5
18:26:03.822 INFO FilterMutectCalls - Picard Version: 3.0.0
18:26:03.822 INFO FilterMutectCalls - Built for Spark Version: 3.3.1
18:26:03.822 INFO FilterMutectCalls - HTSJDK Defaults.COMPRESSION_LEVEL : 2
18:26:03.822 INFO FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
18:26:03.823 INFO FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
18:26:03.823 INFO FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
18:26:03.823 INFO FilterMutectCalls - Deflater: IntelDeflater
18:26:03.823 INFO FilterMutectCalls - Inflater: IntelInflater
18:26:03.823 INFO FilterMutectCalls - GCS max retries/reopens: 20
18:26:03.823 INFO FilterMutectCalls - Requester pays: disabled
18:26:03.824 INFO FilterMutectCalls - Initializing engine
18:26:04.085 INFO FeatureManager - Using codec VCFCodec to read file file:///global/scratch/users/laurenhamm/thesis/aim2/vcfs/GATK/pooled_vcfs/Mutect2/BEL_2014_pool.M2.g.vcf.gz
18:26:04.287 INFO FilterMutectCalls - Done initializing engine
18:26:04.435 INFO ProgressMeter - Starting traversal
18:26:04.436 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
18:26:04.436 INFO FilterMutectCalls - Starting pass 0 through the variants
18:47:35.884 INFO FilterMutectCalls - Finished pass 0 through the variants
18:47:36.850 INFO FilterMutectCalls - Shutting down engine
[December 3, 2024, 6:47:36 PM PST] done. Elapsed time: 21.56 minutes.
java.lang.IllegalArgumentException: alpha must be greater than 0 but got NaN
at org.broadinstitute.hellbender.utils.Utils.validateArg(
at org.broadinstitute.hellbender.utils.param.ParamUtils.isPositive(
at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.traverse(
at org.broadinstitute.hellbender.engine.GATKTool.doWork(
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(
at org.broadinstitute.hellbender.Main.runCommandLineProgram(
at org.broadinstitute.hellbender.Main.mainEntry(
at org.broadinstitute.hellbender.Main.main(
for VariantFiltration/SelectVariants:
Using GATK jar /global/scratch/users/laurenhamm/thesis/software/gatk/build/libs/gatk-package-
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -jar /global/scratch/users/laurenhamm/thesis/software/gatk/build/libs/gatk-package- VariantFiltration -V BEL_2014_pool.M2.g.vcf.gz -filter QD < 2.0 --filter-name QD2 -filter QUAL < 30.0 --filter-name QUAL30 -filter SOR > 3.0 --filter-name SOR3 -filter FS > 60.0 --filter-name FS60 -filter MQ < 40 --filter-name MQ40 -filter MQRankSum < -12.5 --filter-name MQRankSum-12.5 -filter ReadPosRankSum < 8.0 --filter-name ReadPosRankSum8 -O FilterAnno/BEL_2014_pool.M2.FilterAnno.vcf.gz
12:53:19.002 INFO NativeLibraryLoader - Loading from jar:file:/global/scratch/users/laurenhamm/thesis/software/gatk/build/libs/gatk-package-!/com/intel/gkl/native/
12:53:19.039 INFO VariantFiltration - ------------------------------------------------------------
12:53:19.044 INFO VariantFiltration - The Genome Analysis Toolkit (GATK) v4.4.0.0-43-gd79823f-SNAPSHOT
12:53:19.044 INFO VariantFiltration - For support and documentation go to
12:53:19.044 INFO VariantFiltration - Executing as laurenhamm@n0220.savio2 on Linux v4.18.0-553.5.1.el8_10.x86_64 amd64
12:53:19.044 INFO VariantFiltration - Java runtime: Java HotSpot(TM) 64-Bit Server VM v22.0.1+8-16
12:53:19.045 INFO VariantFiltration - Start Date/Time: December 4, 2024, 12:53:18 PM PST
12:53:19.045 INFO VariantFiltration - ------------------------------------------------------------
12:53:19.045 INFO VariantFiltration - ------------------------------------------------------------
12:53:19.045 INFO VariantFiltration - HTSJDK Version: 3.0.5
12:53:19.046 INFO VariantFiltration - Picard Version: 3.0.0
12:53:19.046 INFO VariantFiltration - Built for Spark Version: 3.3.1
12:53:19.046 INFO VariantFiltration - HTSJDK Defaults.COMPRESSION_LEVEL : 2
12:53:19.046 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:53:19.046 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
12:53:19.046 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:53:19.046 INFO VariantFiltration - Deflater: IntelDeflater
12:53:19.047 INFO VariantFiltration - Inflater: IntelInflater
12:53:19.047 INFO VariantFiltration - GCS max retries/reopens: 20
12:53:19.047 INFO VariantFiltration - Requester pays: disabled
12:53:19.047 INFO VariantFiltration - Initializing engine
12:53:19.189 INFO FeatureManager - Using codec VCFCodec to read file file:///global/scratch/users/laurenhamm/thesis/aim2/vcfs/GATK/pooled_vcfs/Mutect2/BEL_2014_pool.M2.g.vcf.gz
12:53:19.272 INFO VariantFiltration - Done initializing engine
12:53:19.323 INFO ProgressMeter - Starting traversal
12:53:19.323 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
12:53:19.334 WARN JexlEngine - ![0,2]: 'MQ < 40;' undefined variable MQ
12:53:19.335 WARN JexlEngine - ![0,9]: 'MQRankSum < -12.5;' undefined variable MQRankSum
12:53:19.335 WARN JexlEngine - ![0,2]: 'FS > 60.0;' undefined variable FS
12:53:19.335 WARN JexlEngine - ![0,14]: 'ReadPosRankSum < 8.0;' undefined variable ReadPosRankSum
12:53:19.335 WARN JexlEngine - ![0,3]: 'SOR > 3.0;' undefined variable SOR
12:53:19.335 WARN JexlEngine - ![0,2]: 'QD < 2.0;' undefined variable QD
##the warnings above repeat too much to fully include
03:28:34.264 INFO VariantFiltration - Shutting down engine
[December 6, 2024, 3:28:34 AM PST] done. Elapsed time: 75.59 minutes.
Using GATK jar /global/scratch/users/laurenhamm/thesis/software/gatk/build/libs/gatk-package-
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -jar /global/scratch/users/laurenhamm/thesis/software/gatk/build/libs/gatk-package- SelectVariants -V FilterAnno/YVO_2015_pool.M2.FilterAnno.vcf.gz -select-type SNP --exclude-filtered -O FilterAnno/YVO_2015_pool.M2.FilterAnnp.pruned.vcf.gz
03:28:38.827 INFO NativeLibraryLoader - Loading from jar:file:/global/scratch/users/laurenhamm/thesis/software/gatk/build/libs/gatk-package-!/com/intel/gkl/native/
03:28:38.866 INFO SelectVariants - ------------------------------------------------------------
03:28:38.871 INFO SelectVariants - The Genome Analysis Toolkit (GATK) v4.4.0.0-43-gd79823f-SNAPSHOT
03:28:38.871 INFO SelectVariants - For support and documentation go to
03:28:38.871 INFO SelectVariants - Executing as laurenhamm@n0220.savio2 on Linux v4.18.0-553.5.1.el8_10.x86_64 amd64
03:28:38.871 INFO SelectVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v22.0.1+8-16
03:28:38.871 INFO SelectVariants - Start Date/Time: December 6, 2024, 3:28:38 AM PST
03:28:38.871 INFO SelectVariants - ------------------------------------------------------------
03:28:38.872 INFO SelectVariants - ------------------------------------------------------------
03:28:38.872 INFO SelectVariants - HTSJDK Version: 3.0.5
03:28:38.872 INFO SelectVariants - Picard Version: 3.0.0
03:28:38.872 INFO SelectVariants - Built for Spark Version: 3.3.1
03:28:38.873 INFO SelectVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
03:28:38.873 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
03:28:38.873 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
03:28:38.873 INFO SelectVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
03:28:38.873 INFO SelectVariants - Deflater: IntelDeflater
03:28:38.873 INFO SelectVariants - Inflater: IntelInflater
03:28:38.874 INFO SelectVariants - GCS max retries/reopens: 20
03:28:38.874 INFO SelectVariants - Requester pays: disabled
03:28:38.874 INFO SelectVariants - Initializing engine
03:28:39.009 INFO FeatureManager - Using codec VCFCodec to read file file:///global/scratch/users/laurenhamm/thesis/aim2/vcfs/GATK/pooled_vcfs/Mutect2/FilterAnno/YVO_2015_pool.M2.FilterAnno.vcf.gz
03:28:39.069 INFO SelectVariants - Done initializing engine
03:28:39.101 INFO ProgressMeter - Starting traversal
03:28:39.102 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
03:30:01.157 INFO SelectVariants - Shutting down engine
[December 6, 2024, 3:30:01 AM PST] done. Elapsed time: 1.37 minutes.
Hi Ren Hamm
Since you are using Mutect2 to call variants with allelic fractions in your pooled samples there is no need for adjusting ploidy for your samples. You might want to enable high sensitivity settings for Mutect2 to call very low fraction calls. We enable these settings under --mitochondria-mode so you can enable that mode to call and filter variant calls for your case.
Hard filtering with Mutect2 calls is not a recommended step as Mutect2's dynamics are different from germline.
If you can try with the mitochondria mode you may be able to get a better result for your variants.
I hope this helps.
I was able to successfully run everything in -mitochondia-mode, but FilterMutectCalls is still failing with the same error:
java.lang.IllegalArgumentException: alpha must be greater than 0 but got NaN
How should I approach solving this?
Hi again.
Can you try without additional mapping and base quality filters in FilterMutectCalls step?
I ran everything again after removing all the additional filter parameters, and I am still recieving the same error.
Hi Ren Hamm
Can you tell us about how these samples were processed before Mutect2? Are these of DNA or RNA origin? What aligner did you use to map reads?
The answer could be lying somewere along those lines.
Of course! This DNA data is from WGS 150bp paired-end Illumina reads, which were aligned using BWA-mem.
Hi again.
Did you happen to build GATK from source? It may be possible that you might have picked a source that is not fully tested. Can you try downloading the latest build and see if the problem still exitsts?
Redownloading the newest version seemed to fix the original issue, and I can successfully make Mutect2 mitocondrial mode vcfs that then can be filtered. However, I need to get allele frequency data for all the biallelic sites, but tools like vcftools code these files as polyploid and winnowing then down to only biallelic sites results in every variant being a "<NON_REF>" variant. How do I obtain accurate estimates of allele frequency from this data for biallelic sites?
Hi Ren Hamm
I think you are using the GVCF mode for Mutect2 which is still in beta stage and probably not even useful for you for this purpose. The GVCF output needs to be genotyped before you can actually pass it to any variant filtration workflow.
For that you can use GenotypeGVCFs tool with the beta option named
--input-is-somatic true
to make sure that tool considers this file as a Mutect2 output. Then you can use the genotyped file and check for INFO/AF or FORMAT/AF fields to gather allele fractions for your variants of interest. For the sake of simplicity you may also need to use
bcftools norm
to split multiallelics to biallelics.
Thank you! I'm getting a strange error now, though.
I am running this command:
/global/scratch/users/laurenhamm/thesis/aim2/vcfs/gatk_reinstall/gatk- --java-options "-Xmx64G" GenotypeGVCFs \
-R /global/scratch/users/laurenhamm/thesis/ref-genomes/v3_mt-cs/Mgut_v3_mt-cs.fa \
-V /global/scratch/users/laurenhamm/thesis/aim2/vcfs/GATK/pooled_vcfs/Mutect2/mitoMode_reinstall/gvcfs/BEL_2014_pool_reinstall.M2.g.vcf.gz \
-O /global/scratch/users/laurenhamm/thesis/aim2/vcfs/GATK/pooled_vcfs/Mutect2/mitoMode_reinstall/vcfs/BEL_2014_pool_reinstall.vcf.gz\
--input-is-somatic true \And this is the error file I receive:
Using GATK jar /global/scratch/users/laurenhamm/thesis/aim2/vcfs/gatk_reinstall/gatk-
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx64G -jar /global/scratch/users/laurenhamm/thesis/aim2/vcfs/gatk_reinstall/gatk- GenotypeGVCFs -R /global/scratch/users/laurenhamm/thesis/ref-genomes/v3_mt-cs/Mgut_v3_mt-cs.fa -V /global/scratch/users/laurenhamm/thesis/aim2/vcfs/GATK/pooled_vcfs/Mutect2/mitoMode_reinstall/gvcfs/BEL_2014_pool_reinstall.M2.g.vcf.gz -O /global/scratch/users/laurenhamm/thesis/aim2/vcfs/GATK/pooled_vcfs/Mutect2/mitoMode_reinstall/vcfs/BEL_2014_pool_reinstall.vcf.gz --input-is-somatic true
10:56:30.831 INFO NativeLibraryLoader - Loading from jar:file:/global/scratch/users/laurenhamm/thesis/aim2/vcfs/gatk_reinstall/gatk-!/com/intel/gkl/native/
SLF4J(W): Class path contains multiple SLF4J providers.
SLF4J(W): Found provider [org.apache.logging.slf4j.SLF4JServiceProvider@336f49a1]
SLF4J(W): Found provider [ch.qos.logback.classic.spi.LogbackServiceProvider@2c8b8de0]
SLF4J(W): See for an explanation.
SLF4J(I): Actual provider is of type [org.apache.logging.slf4j.SLF4JServiceProvider@336f49a1]
10:56:30.932 INFO GenotypeGVCFs - ------------------------------------------------------------
10:56:30.935 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.6.1.0
10:56:30.935 INFO GenotypeGVCFs - For support and documentation go to
10:56:30.935 INFO GenotypeGVCFs - Executing as laurenhamm@n0001.savio2 on Linux v4.18.0-553.5.1.el8_10.x86_64 amd64
10:56:30.935 INFO GenotypeGVCFs - Java runtime: Java HotSpot(TM) 64-Bit Server VM v22.0.1+8-16
10:56:30.935 INFO GenotypeGVCFs - Start Date/Time: February 7, 2025, 10:56:30 AM PST
10:56:30.935 INFO GenotypeGVCFs - ------------------------------------------------------------
10:56:30.936 INFO GenotypeGVCFs - ------------------------------------------------------------
10:56:30.936 INFO GenotypeGVCFs - HTSJDK Version: 4.1.3
10:56:30.936 INFO GenotypeGVCFs - Picard Version: 3.3.0
10:56:30.936 INFO GenotypeGVCFs - Built for Spark Version: 3.5.0
10:56:30.939 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:56:30.939 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:56:30.939 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:56:30.939 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:56:30.939 INFO GenotypeGVCFs - Deflater: IntelDeflater
10:56:30.940 INFO GenotypeGVCFs - Inflater: IntelInflater
10:56:30.940 INFO GenotypeGVCFs - GCS max retries/reopens: 20
10:56:30.940 INFO GenotypeGVCFs - Requester pays: disabled
10:56:30.940 INFO GenotypeGVCFs - Initializing engine
10:56:31.115 INFO FeatureManager - Using codec VCFCodec to read file file:///global/scratch/users/laurenhamm/thesis/aim2/vcfs/GATK/pooled_vcfs/Mutect2/mitoMode_reinstall/gvcfs/BEL_2014_pool_reinstall.M2.g.vcf.gz
10:56:31.404 INFO GenotypeGVCFs - Done initializing engine
10:56:31.404 WARN GenotypeGVCFs - Note that the Mutect2 reference confidence mode is in BETA -- the likelihoods model and output format are subject to change in subsequent versions.
10:56:31.494 INFO ProgressMeter - Starting traversal
10:56:31.495 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute10:56:31.664 WARN ReferenceConfidenceVariantContextMerger - Detected invalid annotations: When trying to merge variant contexts at location Chr_01:16712 the annotation AS_SB_TABLE=[10, 9|2, 0|0, 0] was not a numerical value and was ignored
10:56:31.664 WARN ReferenceConfidenceVariantContextMerger - Reducible annotation 'AS_SB_TABLE' detected, add -G StandardAnnotation -G AS_StandardAnnotation to the command to annotate in the final VC with this annotation.
10:56:31.680 INFO GenotypeGVCFs - Shutting down engine
[February 7, 2025, 10:56:31 AM PST] done. Elapsed time: 0.02 minutes.
org.broadinstitute.hellbender.exceptions.GATKException: Exception thrown at Chr_01:16712 [VC Unknown @ Chr_01:16712-16751 Q. of type=MIXED alleles=[AATAAAATTTAAATATGTTAATGAAATCATGTATTGACCC*, <NON_REF>, A] attr={AS_SB_TABLE=[10, 9|2, 0|0, 0], DP=23, ECNT=10, ECNTH=3, MBQ=[40, 40, 0], MFRL=[258, 366, 0], MMQ=[60, 60, 60], MPOS=[33, 50], OCM=0, POPAF=[2.40, 2.40], TLOD=[5.70, -1.070e+00]} GT=[[BEL_2014_pool AATAAAATTTAAATATGTTAATGAAATCATGTATTGACCC*|A|<NON_REF> DP 21 AD 19,2,0 {AF=0.135,0.041, F1R2=7,2,0, F2R1=9,0,0, FAD=19,3,0, PGT=0|1, PID=16712_AATAAAATTTAAATATGTTAATGAAATCATGTATTGACCC_A, PS=16712, SB=10,9,2,0}]] filters=
at org.broadinstitute.hellbender.engine.VariantLocusWalker.lambda$traverse$0(
at java.base/$ForEachOp$OfRef.accept(
at java.base/$3$1.accept(
at java.base/$2$1.accept(
at java.base/$3$1.accept(
at java.base/java.util.Iterator.forEachRemaining(
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(
at java.base/
at java.base/
at java.base/$ForEachOp.evaluateSequential(
at java.base/$ForEachOp$OfRef.evaluateSequential(
at java.base/
at java.base/
at org.broadinstitute.hellbender.engine.VariantLocusWalker.traverse(
at org.broadinstitute.hellbender.engine.GATKTool.doWork(
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(
at org.broadinstitute.hellbender.Main.runCommandLineProgram(
at org.broadinstitute.hellbender.Main.mainEntry(
at org.broadinstitute.hellbender.Main.main(
Caused by: java.lang.IllegalStateException: Key MBQ found in VariantContext field FORMAT at Chr_01:16712 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF headers by default.
at htsjdk.variant.vcf.VCFEncoder.fieldIsMissingFromHeaderError(
at htsjdk.variant.vcf.VCFEncoder.write(
at htsjdk.variant.variantcontext.writer.VCFWriter.add(
at org.broadinstitute.hellbender.engine.VariantLocusWalker.lambda$traverse$0(
... 20 more -
Can you post the header section of your input GVCF?
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele not already represented at this location by REF and ALT">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele fractions of alternate alleles in the tumor">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=F1R2,Number=R,Type=Integer,Description="Count of reads in F1R2 pair orientation supporting each allele">
##FORMAT=<ID=F2R1,Number=R,Type=Integer,Description="Count of reads in F2R1 pair orientation supporting each allele">
##FORMAT=<ID=FAD,Number=R,Type=Integer,Description="Count of fragments supporting each allele.">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not intended to describe called alleles">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically the position of the first variant in the set)">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##FORMAT=<ID=TLOD,Number=A,Type=Float,Description="Log 10 likelihood ratio score of variant existing versus not existing">
##GATKCommandLine=<ID=Mutect2,CommandLine="Mutect2 --mitochondria-mode true --emit-ref-confidence GVCF --output /global/scratch/users/laurenhamm/thesis/aim2/vcfs/GATK/pooled_vcfs/Mutect2/mitoMode_reinstall/BEL_2014_pool_reinstall.M2.g.vcf.gz --input /global/scratch/users/laurenhamm/thesis/aim2/paleomix/bamOuts/BEL_2014_pool.GEAsamples.bam --reference /global/scratch/users/laurenhamm/thesis/ref-genomes/v3_mt-cs/Mgut_v3_mt-cs.fa --f1r2-median-mq 50 --f1r2-min-bq 20 --f1r2-max-depth 200 --flow-likelihood-parallel-threads 0 --flow-likelihood-optimized-comp false --trim-to-haplotype true --exact-matching false --flow-use-t0-tag false --flow-remove-non-single-base-pair-indels false --flow-remove-one-zero-probs false --flow-quantization-bins 121 --flow-fill-empty-bins-value 0.001 --flow-symmetric-indel-probs false --flow-report-insertion-or-deletion false --flow-disallow-probs-larger-than-call false --flow-lump-probs false --flow-retain-max-n-probs-base-format false --flow-probability-scaling-factor 10 --flow-order-cycle-length 4 --keep-boundary-flows false --genotype-pon-sites false --genotype-germline-sites false --af-of-alleles-not-in-resource -1.0 --mutect3-training-mode false --mutect3-ref-downsample 10 --mutect3-alt-downsample 20 --mutect3-non-artifact-ratio 1 --mutect3-dataset-mode ILLUMINA --tumor-lod-to-emit 3.0 --initial-tumor-lod 2.0 --pcr-snv-qual 40 --pcr-indel-qual 40 --base-qual-correction-factor 5 --max-population-af 0.01 --downsampling-stride 1 --callable-depth 10 --max-suspicious-reads-per-alignment-start 0 --normal-lod 2.2 --ignore-itr-artifacts false --gvcf-lod-band -2.5 --gvcf-lod-band -2.0 --gvcf-lod-band -1.5 --gvcf-lod-band -1.0 --gvcf-lod-band -0.5 --gvcf-lod-band 0.0 --gvcf-lod-band 0.5 --gvcf-lod-band 1.0 --minimum-allele-fraction 0.0 --independent-mates false --flow-mode NONE --disable-adaptive-pruning false --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --min-dangling-branch-length 4 --recover-all-dangling-branches false --max-num-haplotypes-in-population 128 --min-pruning 2 --adaptive-pruning-initial-error-rate 0.001 --pruning-lod-threshold 2.302585092994046 --pruning-seeding-lod-threshold 9.210340371976184 --max-unpruned-variants 100 --linked-de-bruijn-graph false --disable-artificial-haplotype-recovery false --enable-legacy-graph-cycle-detection false --debug-assembly false --debug-graph-transformations false --capture-assembly-failure-bam false --num-matching-bases-in-dangling-end-to-recover -1 --error-correction-log-odds -Infinity --error-correct-reads false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --base-quality-score-threshold 18 --dragstr-het-hom-ratio 2 --dont-use-dragstr-pair-hmm-scores false --pair-hmm-gap-continuation-penalty 10 --expected-mismatch-rate-for-read-disqualification 0.02 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --disable-symmetric-hmm-normalizing false --disable-cap-base-qualities-to-map-quality false --enable-dynamic-read-disqualification-for-genotyping false --dynamic-read-disqualification-threshold 1.0 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --flow-hmm-engine-min-indel-adjust 6 --flow-hmm-engine-flat-insertion-penatly 45 --flow-hmm-engine-flat-deletion-penatly 45 --pileup-detection false --use-pdhmm false --use-pdhmm-overlap-optimization false --make-determined-haps-from-pd-code false --print-pileupcalling-status false --fallback-gga-if-pdhmm-fails true --pileup-detection-enable-indel-pileup-calling false --pileup-detection-active-region-phred-threshold 0.0 --num-artificial-haplotypes-to-add-per-allele 5 --artifical-haplotype-filtering-kmer-size 10 --pileup-detection-snp-alt-threshold 0.1 --pileup-detection-indel-alt-threshold 0.1 --pileup-detection-absolute-alt-depth 0.0 --pileup-detection-snp-adjacent-to-assembled-indel-range 5 --pileup-detection-snp-basequality-filter 12 --pileup-detection-bad-read-tolerance 0.0 --pileup-detection-proper-pair-read-badness true --pileup-detection-edit-distance-read-badness-threshold 0.08 --pileup-detection-chimeric-read-badness true --pileup-detection-template-mean-badness-threshold 0.0 --pileup-detection-template-std-badness-threshold 0.0 --pileup-detection-filter-assembly-alt-bad-read-tolerance 0.0 --pileup-detection-edit-distance-read-badness-for-assembly-filtering-threshold 0.12 --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --override-fragment-softclip-check false --min-base-quality-score 10 --smith-waterman FASTEST_AVAILABLE --max-mnp-distance 1 --force-call-filtered-alleles false --reference-model-deletion-quality 30 --soft-clip-low-quality-ends false --allele-informative-reads-overlap-margin 2 --smith-waterman-dangling-end-match-value 25 --smith-waterman-dangling-end-mismatch-penalty -50 --smith-waterman-dangling-end-gap-open-penalty -110 --smith-waterman-dangling-end-gap-extend-penalty -6 --smith-waterman-haplotype-to-reference-match-value 200 --smith-waterman-haplotype-to-reference-mismatch-penalty -150 --smith-waterman-haplotype-to-reference-gap-open-penalty -260 --smith-waterman-haplotype-to-reference-gap-extend-penalty -11 --smith-waterman-read-to-haplotype-match-value 10 --smith-waterman-read-to-haplotype-mismatch-penalty -15 --smith-waterman-read-to-haplotype-gap-open-penalty -30 --smith-waterman-read-to-haplotype-gap-extend-penalty -5 --flow-assembly-collapse-hmer-size 0 --flow-assembly-collapse-partial-mode false --flow-filter-alleles false --flow-filter-alleles-qual-threshold 30.0 --flow-filter-alleles-sor-threshold 3.0 --flow-filter-lone-alleles false --flow-filter-alleles-debug-graphs false --min-assembly-region-size 50 --max-assembly-region-size 300 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --force-active false --assembly-region-padding 100 --padding-around-indels 75 --padding-around-snps 20 --padding-around-strs 75 --max-extension-into-assembly-region-padding-legacy 25 --max-reads-per-alignment-start 50 --enable-legacy-assembly-region-trimming false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --max-variants-per-shard 0 --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false --max-read-length 2147483647 --min-read-length 30 --minimum-mapping-quality 20 --disable-tool-default-annotations false --enable-all-annotations false",Version="",Date="January 18, 2025, 1:23:52?PM PST">
##INFO=<ID=AS_SB_TABLE,Number=1,Type=String,Description="Allele-specific forward/reverse read counts for strand bias tests. Includes the reference and alleles separated by |.">
##INFO=<ID=AS_UNIQ_ALT_READ_COUNT,Number=A,Type=Integer,Description="Number of reads with unique start and mate end positions for each alt at a variant site" ##INFO=<ID=CONTQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to contamination">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=ECNT,Number=1,Type=Integer,Description="Number of potential somatic events in the assembly region">
##INFO=<ID=ECNTH,Number=A,Type=Integer,Description="Number of somatic events in best supporting haplotype for each alt allele">
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
##INFO=<ID=GERMQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles are not germline variants">
##INFO=<ID=MBQ,Number=R,Type=Integer,Description="median base quality by allele">
##INFO=<ID=MFRL,Number=R,Type=Integer,Description="median fragment length by allele">
##INFO=<ID=MMQ,Number=R,Type=Integer,Description="median mapping quality by allele">
##INFO=<ID=MPOS,Number=A,Type=Integer,Description="median distance from end of read">
##INFO=<ID=NALOD,Number=A,Type=Float,Description="Log 10 odds of artifact in normal with same allele fraction as tumor">
##INFO=<ID=NCount,Number=1,Type=Integer,Description="Count of N bases in the pileup">
##INFO=<ID=NLOD,Number=A,Type=Float,Description="Normal log 10 likelihood ratio of diploid het or hom alt genotypes">
##INFO=<ID=OCM,Number=1,Type=Integer,Description="Number of alt reads whose original alignment doesn't match the current contig.">
##INFO=<ID=PON,Number=0,Type=Flag,Description="site found in panel of normals">
##INFO=<ID=POPAF,Number=A,Type=Float,Description="negative log 10 population allele frequencies of alt alleles">
##INFO=<ID=ROQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to read orientation artifact">
##INFO=<ID=RPA,Number=R,Type=Integer,Description="Number of times tandem repeat unit is repeated, for each allele (including reference)">
##INFO=<ID=RU,Number=1,Type=String,Description="Tandem repeat unit (bases)">
##INFO=<ID=SEQQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles are not sequencing errors">
##INFO=<ID=STR,Number=0,Type=Flag,Description="Variant is a short tandem repeat">
##INFO=<ID=STRANDQ,Number=1,Type=Integer,Description="Phred-scaled quality of strand bias artifact">
##INFO=<ID=STRQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles in STRs are not polymerase slippage errors">
##INFO=<ID=TLOD,Number=A,Type=Float,Description="Log 10 likelihood ratio score of variant existing versus not existing">
##contig=<ID=scaffold_197,length=95235> ##contig=<ID=scaffold_301,length=65824>
##filtering_status=Warning: unfiltered Mutect 2 calls. Please run FilterMutectCalls to remove false positives.
##tumor_sample=BEL_2014_pool -
Hi Ren Hamm
As I said this GVCF for somatic calls is quite beta and this is a bug that we can recreate as well. A fix will be in the works shortly but it may not become alive until the next point release. In the meantime we recommend running Mutect2 without the GVCF mode.
