The output of Mutect2 cannot be accepted by FilterMutectCalls in GATK-4.2.0.0
AnsweredIf you are seeing an error, please provide(REQUIRED) :
a) GATK version used: 4.2.0.0
b) Exact command used:
gatk FilterMutectCalls \
-R /public1/data/resources/ref_genome/GRCh38/GRCh38.d1.vd1.fa \
-V somatic_mutation/Mutect2/test.vcf.gz \
-O somatic_mutation/FilterMutectCalls/test.vcf.gz
c) Entire error log:
I used the "--enable-all-annotations" option within Mutect2 to get a vcf file with abundant information. However, the following FilterMutectCalls step seemed to be intolerant of some information within previous step's vcf file record.
The intolerated record within vcf listed below:
chr1 6197724 . C CT,CTT,CTTT . . AC=1,1,1;AF=0.167,0.167,0.167;AN=6;AS_BaseQRankSum=-6.431;AS_MQ=60.00,60.00,60.00;AS_MQRankSum=0.000;AS_ReadPosRankSum=5.751;AS_SB_TABLE=42,880|3,164|3,32|0,14;AS_UNIQ_ALT_READ_COUNT=167|35|14;BQHIST=5,1,0,0,1,11,2,0,0,0,14,2,0,0,1,15,1,0,0,0,16,1,0,0,0,17,0,2,0,0,18,2,0,0,1,19,6,0,1,0,20,25,0,2,2,21,13,0,1,2,22,20,0,3,3,23,2,1,2,1,24,6,0,2,0,25,21,1,4,7,26,33,0,5,6,27,18,0,0,7,28,29,0,0,4,29,26,2,4,8,30,161,4,5,51,31,263,2,3,51,32,129,2,3,22,33,41,0,0,0,34,15,0,0,0,35,20,0,0,0,36,19,0,0,0,37,12,0,0,0,38,1,0,0,0,39,9,0,0,0,41,18,0,0,0,44,26,0,0,0;BaseQRankSum=-6.431;ClippingRankSum=-7.714;DP=1323;ECNT=1;FS=0.000;LikelihoodRankSum=-7.886;MBQ=31,30,26,30;MFRL=6590,6585,4819,6586;MMQ=60,60,60,60;MPOS=16,15,7;MQ=59.98;MQ0=0;MQRankSum=0.000;NALOD=0.569,1.49,1.49;NCC=0;NCount=0;NLOD=27.80,30.51,30.97;OCM=0;POPAF=6.00,6.00,6.00;REF_BASES=GAACTTGCTTCTTTTTTTTGC;RPA=8,9,10,11;RU=T;ReadPosRankSum=5.751;SOR=1.152;STR;Samples=TCGA-NJ-A55R-01A-11R-A262-07;TLOD=284.47,51.82,3.50 GT:AD:AF:DP:F1R2:F2R1:SB 0/1/2/3:819,166,35,14:0.161,0.034,0.014:1034:365,76,17,4:440,87,17,8:16,803,6,209 0/0:103,1,0,0:0.017,8.250e-03,8.221e-03:104:50,1,0,0:52,0,0,0:26,77,0,1
The error log that FilterMutectCalls emited was listed below:
Using GATK jar /home/lqh/software/GATK-4.2.0.0/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/lqh/software/GATK-4.2.0.0/gatk-package-4.2.0.0-local.jar FilterMutectCalls -R /public1/data/resources/ref_genome/GRCh38/GRCh38.d1.vd1.fa -V somatic_mutation/Mutect2/test.vcf.gz -O somatic_mutation/FilterMutectCalls/test.vcf.gz
11:03:39.517 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/lqh/software/GATK-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jun 04, 2021 11:03:49 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
11:03:49.968 INFO FilterMutectCalls - ------------------------------------------------------------
11:03:49.969 INFO FilterMutectCalls - The Genome Analysis Toolkit (GATK) v4.2.0.0
11:03:49.969 INFO FilterMutectCalls - For support and documentation go to https://software.broadinstitute.org/gatk/
11:03:49.969 INFO FilterMutectCalls - Executing as lqh@master on Linux v5.6.14-1.el7.elrepo.x86_64 amd64
11:03:49.969 INFO FilterMutectCalls - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_152-b16
11:03:49.969 INFO FilterMutectCalls - Start Date/Time: June 4, 2021 11:03:39 AM CST
11:03:49.969 INFO FilterMutectCalls - ------------------------------------------------------------
11:03:49.969 INFO FilterMutectCalls - ------------------------------------------------------------
11:03:49.970 INFO FilterMutectCalls - HTSJDK Version: 2.24.0
11:03:49.970 INFO FilterMutectCalls - Picard Version: 2.25.0
11:03:49.970 INFO FilterMutectCalls - Built for Spark Version: 2.4.5
11:03:49.970 INFO FilterMutectCalls - HTSJDK Defaults.COMPRESSION_LEVEL : 2
11:03:49.970 INFO FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
11:03:49.970 INFO FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
11:03:49.970 INFO FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
11:03:49.970 INFO FilterMutectCalls - Deflater: IntelDeflater
11:03:49.971 INFO FilterMutectCalls - Inflater: IntelInflater
11:03:49.971 INFO FilterMutectCalls - GCS max retries/reopens: 20
11:03:49.971 INFO FilterMutectCalls - Requester pays: disabled
11:03:49.971 INFO FilterMutectCalls - Initializing engine
11:03:50.504 INFO FeatureManager - Using codec VCFCodec to read file file:///home/lqh/somatic_mutation/Mutect2/test.vcf.gz
11:03:50.696 INFO FilterMutectCalls - Done initializing engine
11:03:50.840 INFO ProgressMeter - Starting traversal
11:03:50.840 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
11:03:50.841 INFO FilterMutectCalls - Starting pass 0 through the variants
11:03:51.014 INFO FilterMutectCalls - Shutting down engine
[June 4, 2021 11:03:51 AM CST] org.broadinstitute.hellbender.tools.walkers.mutect.filtering.FilterMutectCalls done. Elapsed time: 0.19 minutes.
Runtime.totalMemory()=625999872
java.lang.NumberFormatException: For input string: "167|35|14"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.valueOf(Integer.java:766)
at htsjdk.variant.variantcontext.CommonInfo.lambda$getAttributeAsIntList$1(CommonInfo.java:288)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Collections$2.tryAdvance(Collections.java:4717)
at java.util.Collections$2.forEachRemaining(Collections.java:4725)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at htsjdk.variant.variantcontext.CommonInfo.getAttributeAsList(CommonInfo.java:274)
at htsjdk.variant.variantcontext.CommonInfo.getAttributeAsIntList(CommonInfo.java:282)
at htsjdk.variant.variantcontext.VariantContext.getAttributeAsIntList(VariantContext.java:827)
at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.DuplicatedAltReadFilter.areAllelesArtifacts(DuplicatedAltReadFilter.java:26)
at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.HardAlleleFilter.calculateErrorProbabilityForAlleles(HardAlleleFilter.java:16)
at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.Mutect2AlleleFilter.errorProbabilities(Mutect2AlleleFilter.java:86)
at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.ErrorProbabilities.lambda$new$0(ErrorProbabilities.java:27)
at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321)
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.ErrorProbabilities.<init>(ErrorProbabilities.java:25)
at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.Mutect2FilteringEngine.accumulateData(Mutect2FilteringEngine.java:138)
at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.FilterMutectCalls.nthPassApply(FilterMutectCalls.java:154)
at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.lambda$traverse$0(MultiplePassVariantWalker.java:40)
at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.lambda$traverseVariants$1(MultiplePassVariantWalker.java:77)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.traverseVariants(MultiplePassVariantWalker.java:75)
at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.traverse(MultiplePassVariantWalker.java:40)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1058)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
What's worth noted is that I tested FilterMutectCalls within GATK-4.1.6.0, it surly can accept the vcf record above and output normal records as below:
chr1 6197724 . C CT,CTT,CTTT . multiallelic AC=1,1,1;AF=0.167,0.167,0.167;AN=6;AS_BaseQRankSum=-6.431;AS_MQ=60.00,60.00,60.00;AS_MQRankSum=0.000;AS_ReadPosRankSum=5.751;AS_SB_TABLE=42,880|3,164|3,32|0,14;AS_UNIQ_ALT_READ_COUNT=167|35|14;BQHIST=5,1,0,0,1,11,2,0,0,0,14,2,0,0,1,15,1,0,0,0,16,1,0,0,0,17,0,2,0,0,18,2,0,0,1,19,6,0,1,0,20,25,0,2,2,21,13,0,1,2,22,20,0,3,3,23,2,1,2,1,24,6,0,2,0,25,21,1,4,7,26,33,0,5,6,27,18,0,0,7,28,29,0,0,4,29,26,2,4,8,30,161,4,5,51,31,263,2,3,51,32,129,2,3,22,33,41,0,0,0,34,15,0,0,0,35,20,0,0,0,36,19,0,0,0,37,12,0,0,0,38,1,0,0,0,39,9,0,0,0,41,18,0,0,0,44,26,0,0,0;BaseQRankSum=-6.431;CONTQ=93;ClippingRankSum=-7.714;DP=1323;ECNT=1;FS=0.000;GERMQ=93;LikelihoodRankSum=-7.886;MBQ=31,30,26,30;MFRL=6590,6585,4819,6586;MMQ=60,60,60,60;MPOS=16,15,7;MQ=59.98;MQ0=0;MQRankSum=0.000;NALOD=0.569,1.49,1.49;NCC=0;NCount=0;NLOD=27.80,30.51,30.97;OCM=0;POPAF=6.00,6.00,6.00;REF_BASES=GAACTTGCTTCTTTTTTTTGC;RPA=8,9,10,11;RU=T;ReadPosRankSum=5.751;SEQQ=93;SOR=1.152;STR;STRANDQ=93;STRQ=21;Samples=TCGA-NJ-A55R-01A-11R-A262-07;TLOD=284.47,51.82,3.50 GT:AD:AF:DP:F1R2:F2R1:SB 0/1/2/3:819,166,35,14:0.161,0.034,0.014:1034:365,76,17,4:440,87,17,8:16,803,6,209 0/0:103,1,0,0:0.017,8.250e-03,8.221e-03:104:50,1,0,0:52,0,0,0:26,77,0,1
Much appreciated for your precious time to inspect my report, thanks!
-
Hi, actually I changed all the separators within "AS_UNIQ_ALT_READ_COUNT=167|35|14" from "|" into "," and used "AS_UNIQ_ALT_READ_COUNT=167,35,14" as input records. Then the error has disappeared, which indicated the separator "|" had caused the issue.
So I digged into the java source code (github link) for AS_UNIQ_ALT_READ_COUNT annotation, found that it finally revoked encodeAnyASListWithRawDelim function to combine list into "|" (ALLELE_SPECIFIC_RAW_DELIM) separated string and appeared as actual annotation. I'm not sure whether to use ALLELE_SPECIFIC_REDUCED_DELIM (",") or ALLELE_SPECIFIC_RAW_DELIM ("|") was better here, but seems that "," separated annotation here can be accepted by FilterMutectCalls.
P.S: Several downstream annotations (CONTQ, SEQQ and STRANDQ) were gone missing within FilterMutectCalls results, hope you can enlighten me the reason for their disappearance. Much thanks!
-
Hi Qihan Long,
Thanks for your post and digging into this issue! This does indeed look like a bug in the 4.2.0.0 version of FilterMutectCalls, or a bug in Mutect2 for producing the output like this. I have created a ticket here where you can follow along for updates regarding a fix to the issue: https://github.com/broadinstitute/gatk/issues/7298
For now you can either use the older version of FilterMutectCalls or you can remove this annotation from your output with -AX AS_UNIQ_ALT_READ_COUNT.
Regarding your P.S. message, there is another thread about the STRANDQ annotation and the others should fall in the same category as well: https://gatk.broadinstitute.org/hc/en-us/community/posts/360078035372-STRANDQ-missing-from-mutect2-vcf-records. Not all annotations are meant to be seen in the output of FilterMutectCalls.
Best,
Genevieve
-
Thanks for your instant reply and inspection!
I shall try to use Mutect2 (4.2.0.0) combined with FilterMutectCalls (4.1.6.0) to finish my variant calling and annotation process. Hope it's a bug within FilterMutectCalls (4.2.0.0), otherwise I'll have to re-call all the variants or modify vcf files accordingly which can be a messy work, hahah.
-
Yeah, I hope so too!
I'll try to get more information about this from the Mutect2 developers and will let you know.
-
Hi Genevieve,
Thanks for your time~
Much appreciated!
-
FYI, if anyone wants to utilize PyVCF to read in vcf records I mentioned before, you'll definitely found the error "ValueError: could not convert string to float: '167|35|14" which means PyVCF cannot parse this AS_UNIQ_ALT_READ_COUNT INFO domain since they're labeled to be Integer in vcf header's INFO annotation.
Therefore, manually change the INFO's annotation within PyVCF can temporarily fix this issue, corresponding codes listed below:
import vcf
import collections
vcf_reader = vcf.Reader(filename="C:\\Users\\-PC\Desktop\sftp\\test.vcf.gz")
# Change INFO annotation for this errorneous field within vcf_reader class
pre_info = vcf_reader.infos['AS_UNIQ_ALT_READ_COUNT']
after_info = collections.namedtuple('Info', ['id', 'num', 'type', 'desc', 'source', 'version'])
after_info = after_info(pre_info.id, pre_info.num,
"String", pre_info.desc,
pre_info.source, pre_info.version)
vcf_reader.infos['AS_UNIQ_ALT_READ_COUNT'] = after_info
# Feel free to continue your work! -
Hi Qihan Long,
Thank you for posting the solution you found for other users! It's very helpful.
I confirmed with the developers that the AS annotations had formatting changes recently and it looks like they were not fully updated. The team will continue to look into this issue at the ticket I created. https://github.com/broadinstitute/gatk/issues/7298
You can continue to use 4.1.6.0 for FilterMutectCalls. Or, you could remove the annotation with bcftools annotate, then you can use 4.2.0.0 FilterMutectCalls. The AS_UNIQ_ALT_READ_COUNT annotation is really only used for Broad internally for special use cases or troubleshooting.
Thanks for writing in about this issue!
Best,
Genevieve
-
Hi Genevieve Brandt,
Thank you for your timely update, I successfully removed the annotation using GATK's SelectVariants with --drop-info-annotation option which proved GATK's versatility~
Unfortunately, it seems that I spotted another bug emitted within other Mutect2 records.
chr22 22215158 . G A,C . . AC=1,1;AF=0.200,0.200;AN=5;AS_BaseQRankSum=-0.518;AS_MQ=60.00,;AS_MQRankSum=0.000;AS_ReadPosRankSum=1.087;AS_SB_TABLE=74,66|6,24|0,0;AS_UNIQ_ALT_READ_COUNT=30|0;BQHIST=13,0,0,1,17,0,0,1,19,1,0,0,20,0,0,13,21,0,0,1,22,0,0,2,23,0,0,1,24,2,0,2,25,0,0,1,26,0,0,4,27,0,0,7,28,2,0,8,29,6,0,24,30,7,0,11,31,9,0,10,32,0,0,4,33,0,0,1,35,0,0,11,36,0,0,14,37,0,0,3,38,0,0,1,40,0,0,2,42,0,0,2,43,0,0,3,45,0,0,4;BaseQRankSum=-0.518;ClippingRankSum=-3.599;DP=179;ECNT=29;FS=29.496;LikelihoodRankSum=0.564;MBQ=30,30,0;MFRL=142,160,0;MMQ=60,60,60;MPOS=16,50;MQ=60.00;MQ0=0;MQRankSum=0.000;NALOD=1.37,1.37;NCC=0;NCount=0;NLOD=13.55,13.24;OCM=0;POPAF=6.00,6.00;REF_BASES=GGATCTCAGAGAGATTCTCTG;ReadPosRankSum=1.087;SOR=2.607;Samples=TCGA-55-8092-01A-11R-2241-07;TLOD=95.63,14.73 GT:AD:AF:DP:F1R2:F2R1:SB 0/1/2:90,30,0:0.235,0.050:120:39,12,0:40,14,0:42,48,6,24 0/0:50,0,0:0.021,0.021:50:27,0,0:23,0,0:32,18,0,0
This INFO domain(AS_MQ) containing erroneous value (60.00,) which cannot be parsed by PyVCF package since its value is annotated to be Float.
All my reports were based on switching on "--enable-all-annotations" option, so perhaps the bug I mentioned cannot be encountered if following a standard pipeline (GATK Best Practice). Sorry for your precious time to inspect this rare bug.
P.S: Actually I spent 3h trying to locate the bug within AS_RMSMappingQuality related source code (Github link), only found it possibly related to makeFinalizedAnnotationString function since it missed another allele's AS_MQ value which supposed to be after the comma (","), hope it helps.
P.P.S: To make vcf file readable to PyVCF, simply change the INFO's annotation within PyVCF as I mentioned before, then it's ok to continue.
-
Thank you for updating us about this other issue you found! I have added it to the github ticket I originally created, here: https://github.com/broadinstitute/gatk/issues/7298. The team will get this fixed as soon as possible within their timelines.
Also - thank you for taking the time to find the related code! No worries about not finding it though :)
I don't think many users run Mutect2 with --enable-all-annotations so that must be why we did not identify this bug as of now. I confirmed with the developer team that the AS_MQ annotation is not generally used other than troubleshooting and special use cases. So it shouldn't be a problem to remove that one as well from your analysis.
Best,
Genevieve
-
Hi Genevieve Brandt,
I got it! Thanks for your clarification, it's of great help~
Please sign in to leave a comment.
10 comments