how to combine mutect2 and haplotypecaller vcfs from the same sample
AnsweredHi,
I want to predict all the possible peptides that could be present in a sample. For this reason we run both mutect2 and haplotypecaller on RNA-seq data. Our goal is to create a union of those 2 vcf files for further analysis. However:
- GenotypeGVCFs only accepts gvcfs.
- FitlerMutectCalls throws an error when processing mutect2 output made with:
--emit-ref-confidence GVCF
I am using gatk 4.2.2.0 from the latest offical (and unaltered) docker image)
According the stdout log messages the first scanning pass of the VCF seems to finish:
13:00:00.588 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Sep 23, 2021 1:00:00 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
13:00:00.868 INFO FilterMutectCalls - ------------------------------------------------------------
13:00:00.868 INFO FilterMutectCalls - The Genome Analysis Toolkit (GATK) v4.2.2.0
13:00:00.868 INFO FilterMutectCalls - For support and documentation go to https://software.broadinstitute.org/gatk/
13:00:00.868 INFO FilterMutectCalls - Executing as root@8aa2f993ccf8 on Linux v5.3.18-lp152.57-default amd64
13:00:00.869 INFO FilterMutectCalls - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
13:00:00.869 INFO FilterMutectCalls - Start Date/Time: September 23, 2021 1:00:00 PM GMT
13:00:00.869 INFO FilterMutectCalls - ------------------------------------------------------------
13:00:00.869 INFO FilterMutectCalls - ------------------------------------------------------------
13:00:00.869 INFO FilterMutectCalls - HTSJDK Version: 2.24.1
13:00:00.869 INFO FilterMutectCalls - Picard Version: 2.25.4
13:00:00.869 INFO FilterMutectCalls - Built for Spark Version: 2.4.5
13:00:00.869 INFO FilterMutectCalls - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:00:00.869 INFO FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:00:00.869 INFO FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:00:00.870 INFO FilterMutectCalls - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:00:00.870 INFO FilterMutectCalls - Deflater: IntelDeflater
13:00:00.870 INFO FilterMutectCalls - Inflater: IntelInflater
13:00:00.870 INFO FilterMutectCalls - GCS max retries/reopens: 20
13:00:00.870 INFO FilterMutectCalls - Requester pays: disabled
13:00:00.870 INFO FilterMutectCalls - Initializing engine
13:00:01.522 INFO FeatureManager - Using codec VCFCodec to read file file:///home/path/to/sample/SampleName.called.g.vcf.gz
13:00:02.097 INFO FilterMutectCalls - Done initializing engine
13:00:02.293 INFO IOUtils - Extracting data from archive: file:///home/path/to/sample/read-orientation-model.tar.gz
13:00:02.348 INFO IOUtils - Extracting file: ./SampleName.orientation_priors
13:00:02.385 INFO ProgressMeter - Starting traversal
13:00:02.385 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
13:00:02.386 INFO FilterMutectCalls - Starting pass 0 through the variants
13:00:12.401 INFO ProgressMeter - 1:15882556 0.2 352000 2109047.3
13:00:22.405 INFO ProgressMeter - 1:56553721 0.3 1248000 3740633.4
13:00:32.405 INFO ProgressMeter - 1:153762932 0.5 2166000 4329258.1
13:00:42.409 INFO ProgressMeter - 1:224423507 0.7 3191000 4783749.3
13:00:52.411 INFO ProgressMeter - 10:44371946 0.8 4039000 4844377.8
13:01:02.416 INFO ProgressMeter - 10:110510369 1.0 4945000 4942528.7
13:01:12.418 INFO ProgressMeter - 11:46397314 1.2 5824000 4989719.0
13:01:22.420 INFO ProgressMeter - 11:114256138 1.3 6798000 5096334.1
13:01:32.423 INFO ProgressMeter - 12:44002822 1.5 7702000 5132613.6
13:01:42.424 INFO ProgressMeter - 12:116210604 1.7 8709000 5223415.1
13:01:52.425 INFO ProgressMeter - 13:99548149 1.8 9621000 5245958.3
13:02:02.428 INFO ProgressMeter - 14:77287162 2.0 10547000 5271654.9
13:02:12.433 INFO ProgressMeter - 15:55359021 2.2 11457000 5285935.1
13:02:22.444 INFO ProgressMeter - 16:1792430 2.3 12368000 5298376.4
13:02:32.452 INFO ProgressMeter - 16:68121250 2.5 13311000 5322058.3
13:02:42.456 INFO ProgressMeter - 17:18681513 2.7 14218000 5329418.4
13:02:52.463 INFO ProgressMeter - 17:64911885 2.8 15148000 5343932.5
13:03:02.464 INFO ProgressMeter - 18:55303785 3.0 16075000 5356042.1
13:03:12.475 INFO ProgressMeter - 19:19603794 3.2 17019000 5371904.7
13:03:22.481 INFO ProgressMeter - 2:277916 3.3 17909000 5370149.2
13:03:32.486 INFO ProgressMeter - 2:74092473 3.5 18817000 5373726.8
13:03:42.491 INFO ProgressMeter - 2:168876729 3.7 19697000 5369346.4
13:03:52.495 INFO ProgressMeter - 20:361613 3.8 20697000 5396659.8
13:04:02.505 INFO ProgressMeter - 21:9097727 4.0 21644000 5408318.4
13:04:12.506 INFO ProgressMeter - 22:39793603 4.2 22577000 5415880.4
13:04:22.511 INFO ProgressMeter - 3:47122775 4.3 23481000 5416088.4
13:04:32.514 INFO ProgressMeter - 3:125002012 4.5 24396000 5418764.4
13:04:42.522 INFO ProgressMeter - 4:2508511 4.7 25367000 5433146.8
13:04:52.526 INFO ProgressMeter - 4:106041565 4.8 26287000 5436065.3
13:05:02.540 INFO ProgressMeter - 5:53056131 5.0 27239000 5445004.9
13:05:12.545 INFO ProgressMeter - 5:143268322 5.2 28186000 5452558.2
13:05:22.546 INFO ProgressMeter - 6:31627330 5.3 29092000 5452024.0
13:05:32.550 INFO ProgressMeter - 6:125459062 5.5 30051000 5461104.2
13:05:42.551 INFO ProgressMeter - 7:30028818 5.7 30973000 5463172.3
13:05:52.555 INFO ProgressMeter - 7:104938114 5.8 31844000 5456336.8
13:06:02.560 INFO ProgressMeter - 8:22066737 6.0 32712000 5449366.1
13:06:12.563 INFO ProgressMeter - 8:125357902 6.2 33655000 5454958.0
13:06:22.568 INFO ProgressMeter - 9:92242342 6.3 34572000 5456123.6
13:06:32.592 INFO ProgressMeter - 9:134780930 6.5 35317000 5430794.5
13:06:42.577 INFO ProgressMeter - X:149925594 6.7 36347000 5449447.9
13:06:44.681 WARN IntelInflater - Zero Bytes Written : 0
13:06:44.748 INFO FilterMutectCalls - Finished pass 0 through the variants
13:06:49.201 INFO FilterMutectCalls - Shutting down engine
[September 23, 2021 1:06:49 PM GMT] org.broadinstitute.hellbender.tools.walkers.mutect.filtering.FilterMutectCalls done. Elapsed time: 6.81 minutes.
Runtime.totalMemory()=810942464
Followed by the stacktrace below:
java.lang.IllegalArgumentException: log10p: Log10-probability must be 0 or less
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:798)
at org.broadinstitute.hellbender.utils.MathUtils.log10BinomialProbability(MathUtils.java:646)
at org.broadinstitute.hellbender.utils.MathUtils.binomialProbability(MathUtils.java:639)
at org.broadinstitute.hellbender.tools.walkers.mutect.clustering.SomaticClusteringModel.lambda$calculateQuantileBackgroundResponsibilities$10(SomaticClusteringModel.java:271)
at org.broadinstitute.hellbender.utils.MathUtils.applyToArray(MathUtils.java:1047)
at org.broadinstitute.hellbender.tools.walkers.mutect.clustering.SomaticClusteringModel.calculateQuantileBackgroundResponsibilities(SomaticClusteringModel.java:271)
at org.broadinstitute.hellbender.tools.walkers.mutect.clustering.SomaticClusteringModel.initializeClusters(SomaticClusteringModel.java:165)
at org.broadinstitute.hellbender.tools.walkers.mutect.clustering.SomaticClusteringModel.learnAndClearAccumulatedData(SomaticClusteringModel.java:325)
at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.Mutect2FilteringEngine.learnParameters(Mutect2FilteringEngine.java:153)
at org.broadinstitute.hellbender.tools.walkers.mutect.filtering.FilterMutectCalls.afterNthPass(FilterMutectCalls.java:165)
at org.broadinstitute.hellbender.engine.MultiplePassVariantWalker.traverse(MultiplePassVariantWalker.java:44)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Using GATK jar /gatk/gatk-package-4.2.2.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx800M -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /gatk/gatk-package-4.2.2.0-local.jar FilterMutectCalls --reference homo_sapiens_genome.fa --variant SampleName.called.g.vcf.gz --stats merged_mutect.stats --output output_dir/SampleName.mutect2.filtered.vcf.gz --filtering-stats output_dir/SampleName.mutect_filtering.stats --tumor-segmentation segments.table --contamination-table calculatecontamination.table --ob-priors read-orientation-model.tar.gz
I did notice the problem seems to occur after the entire file was scanned 1 time, so I therefore assume the problem is not with any particular variant.
All preceding steps/tools according to the best practices for RNA variant calling complete without errors. So I have performed the splitNcigarReads as recommended.
My reference is based on the Ensembl contig names. but I have converted all the GRCh38 input vcf files such as the known variant arguments to this naming convention as well. (and haplotypeCaller and other steps run just fine with these files)
I think the 2 relevant commands that I run are:
gatk \
--java-options "-Xmx${memory_limit}M -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" \
FilterMutectCalls \
--reference "${genome_fasta}" \
--variant "${mutect_vcf}" \
--stats "${mutect_vcf_stats}" \
--output "${filtered_vcf}" \
--filtering-stats "${filtering_stats}" \
--tumor-segmentation "${tumor_segmentation_table}" \
--contamination-table "${contamination_table}" \
--ob-priors "${read_orientation_model}" \
;
and:
gatk \
--java-options "-Xmx${memory_limit}M" \
Mutect2 \
--reference "${ref_path}/genome/${genome}" \
--smith-waterman FASTEST_AVAILABLE \
--pair-hmm-implementation FASTEST_AVAILABLE \
--native-pair-hmm-threads ${native_hmm_threads} \
--native-pair-hmm-use-double-precision \
--recover-all-dangling-branches \
--max-mnp-distance 0 \
--f1r2-tar-gz "${output_f1r2}" \
--output "${output_file}" \
--input "${aligned_reads}" \
--panel-of-normals "${panel_of_normals_vcf}" \
--germline-resource "${allele_frequency_vcf}" \
--intervals "${interval_list}" \
--emit-ref-confidence GVCF \
--soft-clip-low-quality-ends \
--dragstr-params-path "${dragstr_params_path}" \
;
When googling for:
log10p: Log10-probability must be 0 or less
I found old issues (2019 or 2020 if memory serves) which seem to have been solved shortly after, so I assume this is an unrelated issue?Thanks in advance.
-
I am going to move your post into our Community Discussions -> Special GATK Use Cases topic, as the Somatic topic is for reporting bugs and issues with GATK.
You can read more about our forum guidelines and the topics here: Forum Guidelines.
Best,
Genevieve
-
Hi Genevieve,
Sorry and thanks for correcting that.
Greetings.
Yanick
-
Yanick Paco Hagemeijer what are you expecting for the merge if there are events from Mutect2 and HaplotypeCaller at the same site? Or are you wanting a two sample VCF where one sample is the output from Mutect2 and the other sample is the output from HaplotypeCaller?
-
Thanks for the quick reply!
I would expect the latter (a multi sample VCF). I cannot even begin to imagine the complexity of having to fold those 2 files into a chimera sample.
I would like to filter the calls made by mutect2 prior to merging the 'samples' by calling FilterMutectCalls. So to clarify what I encountered was the following:
- HaplotypeCaller only outputs haplotype information when outputting a gvcf or bpvcf (not clearly documented)
- Mutect2 can also output in gvcf format (completes without error)
- FilterMutectCalls crashes on Mutect2's gvcf output (unexpected)
- GenotypeGVCFs only accepts gvcf files (expected, non-issue, obviously I would need to rename either sample)
-
The GVCF output from Mutect2 is still in BETA so that is probably why there are remaining issues with FilterMutectCalls. I wouldn't recommend skipping the FilterMutectCalls step because there would be too many false positives in your output.
Even though if somehow we were able to patch FilterMutectCalls to accept the GVCF, the output would be a VCF and wouldn't be able to be accepted to CombineGVCFs or GenotypeGVCFs.
I think you're going to run across too many issues if you try the method you are suggesting here. I would recommend that you follow each best practices method for germline and somatic separately, then compare the VCF outputs once all filtering has been completed.
Alternatively, some users have found great results from running Mutect2 with the option --genotype-germline-sites. It's no longer experimental, we just have not updated the label.
Best,
Genevieve
-
Hi Genevieve,
I am going to give that a try, thanks for the insight.
Have a nice weekend,
Yanick
-
You too!
Please sign in to leave a comment.
7 comments