ModelSegments - java.lang.IllegalArgumentException: Metadata of the allelic counts and the segments do not match
AnsweredHi, anyone has encountered this before?
I encountered this error when following the somatic CNV guideline on my WGS paired normal-tumor samples. No output except hets.tsv & hets.normal.tsv. Any help appreciated. thank you!
The CollectAllelicCount was ran on 1000g_hg38snphigh confidence.vcf.gz (generated using SelectVariant using AF > 0.1. (i can't find the gnomad wgs on resource bundle).
a) GATK version used: v4.2.5.0
b) Exact command used:
$GATK ModelSegments --denoised-copy-ratios T${SAMPLE}.denoisedCR.tsv --allelic-counts AlleleCount/T${SAMPLE}.allelicCounts.tsv --normal-allelic-counts AlleleCount/N${SAMPLE}.allelicCounts.tsv --output . --output-prefix ${SAMPLE}.cr
c) Entire program log:
09:52:54.356 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/csittz/gatk-4.2.5.0/gatk-package-4.2.5.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Apr 19, 2022 9:52:55 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
09:52:55.525 INFO ModelSegments - ------------------------------------------------------------
09:52:55.526 INFO ModelSegments - The Genome Analysis Toolkit (GATK) v4.2.5.0
09:52:55.526 INFO ModelSegments - For support and documentation go to https://software.broadinstitute.org/gatk/
09:52:55.527 INFO ModelSegments - Executing as csittz@CSI-BTX7ZH3 on Linux v4.4.0-19041-Microsoft amd64
09:52:55.530 INFO ModelSegments - Java runtime: OpenJDK 64-Bit Server VM v11.0.11+9-Ubuntu-0ubuntu2.20.04
09:52:55.530 INFO ModelSegments - Start Date/Time: April 19, 2022 at 9:52:54 AM SGT
09:52:55.531 INFO ModelSegments - ------------------------------------------------------------
09:52:55.531 INFO ModelSegments - ------------------------------------------------------------
09:52:55.533 INFO ModelSegments - HTSJDK Version: 2.24.1
09:52:55.533 INFO ModelSegments - Picard Version: 2.25.4
09:52:55.534 INFO ModelSegments - Built for Spark Version: 2.4.5
09:52:55.534 INFO ModelSegments - HTSJDK Defaults.COMPRESSION_LEVEL : 2
09:52:55.535 INFO ModelSegments - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
09:52:55.535 INFO ModelSegments - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
09:52:55.535 INFO ModelSegments - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
09:52:55.536 INFO ModelSegments - Deflater: IntelDeflater
09:52:55.536 INFO ModelSegments - Inflater: IntelInflater
09:52:55.537 INFO ModelSegments - GCS max retries/reopens: 20
09:52:55.537 INFO ModelSegments - Requester pays: disabled
09:52:55.538 INFO ModelSegments - Initializing engine
09:52:55.539 INFO ModelSegments - Done initializing engine
09:52:55.539 INFO ModelSegments - Used memory (MB) after initializing engine: 34
09:52:55.547 INFO ModelSegments - Reading file (T02.denoisedCR.tsv)...
09:52:55.625 INFO ModelSegments - Reading file (AlleleCount/T02.allelicCounts.tsv)...
09:53:01.086 INFO ModelSegments - Reading file (AlleleCount/N02.allelicCounts.tsv)...
09:53:05.855 INFO ModelSegments - Used memory (MB) after reading files: 1756
09:53:06.749 INFO ModelSegments - Used memory (MB) after validating data: 1703
09:53:07.495 INFO NaiveHeterozygousPileupGenotypingUtils - Genotyping heterozygous sites from available allelic counts...
09:53:07.496 INFO NaiveHeterozygousPileupGenotypingUtils - Matched normal was provided, running in matched-normal mode...
09:53:08.381 INFO NaiveHeterozygousPileupGenotypingUtils - Retained 3855068 / 5090504 sites after filtering allelic counts with total count less than 30 in matched-normal sample N02...
09:53:09.223 INFO NaiveHeterozygousPileupGenotypingUtils - Retained 3855042 / 5090504 sites after filtering on overlap with copy-ratio intervals in matched-normal sample N02...
09:53:15.402 INFO NaiveHeterozygousPileupGenotypingUtils - Retained 1118178 / 5090504 sites after filtering on heterozygosity in matched-normal sample N02...
09:53:15.403 INFO NaiveHeterozygousPileupGenotypingUtils - Retained 5090504 / 5090504 sites after filtering allelic counts with total count less than 0 in case sample T02...
09:53:16.509 INFO NaiveHeterozygousPileupGenotypingUtils - Retained 5089851 / 5090504 sites after filtering on overlap with copy-ratio intervals in case sample T02...
09:53:16.510 INFO NaiveHeterozygousPileupGenotypingUtils - Retaining allelic counts for case sample N02 at heterozygous sites in matched-normal sample T02...
09:53:17.761 INFO NaiveHeterozygousPileupGenotypingUtils - Retained 1118178 / 5090504 sites after applying all filters to case sample T02.
09:53:18.003 INFO ModelSegments - Used memory (MB) after genotyping: 3080
09:53:18.004 INFO ModelSegments - Writing heterozygous allelic counts for matched normal to /mnt/d/LR_Lung/CNV/./02.cr.hets.normal.tsv...
09:53:18.811 INFO ModelSegments - Writing allelic counts for case sample at heterozygous sites in matched normal to /mnt/d/LR_Lung/CNV/./02.cr.hets.tsv...
09:53:20.772 INFO MultisampleMultidimensionalKernelSegmenter - Using first allelic-count site in each copy-ratio interval (114 / 1118178) for multidimensional segmentation...
09:53:20.786 INFO MultisampleMultidimensionalKernelSegmenter - Finding changepoints in (333, 1118178) data points and 24 chromosomes across 1 sample(s)...
09:53:20.787 INFO MultisampleMultidimensionalKernelSegmenter - Finding changepoints in 15 data points in chromosome chr1...
09:53:20.789 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (15) to segment; using all data points to calculate kernel matrix.
09:53:20.808 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 16) exceeds number of data points (15). Local changepoint costs will not be calculated for this window size.
09:53:20.809 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 32) exceeds number of data points (15). Local changepoint costs will not be calculated for this window size.
09:53:20.810 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 64) exceeds number of data points (15). Local changepoint costs will not be calculated for this window size.
09:53:20.811 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 128) exceeds number of data points (15). Local changepoint costs will not be calculated for this window size.
09:53:20.811 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 256) exceeds number of data points (15). Local changepoint costs will not be calculated for this window size.
09:53:20.812 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 512) exceeds number of data points (15). Local changepoint costs will not be calculated for this window size.
09:53:20.812 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
.....
09:53:20.997 INFO GibbsSampler - Starting MCMC sampling.
09:53:21.057 INFO GibbsSampler - 25 of 100 samples generated.
09:53:21.095 INFO GibbsSampler - 50 of 100 samples generated.
09:53:21.126 INFO GibbsSampler - 75 of 100 samples generated.
09:53:21.149 INFO GibbsSampler - 100 of 100 samples generated.
09:53:21.149 INFO GibbsSampler - MCMC sampling complete.
09:53:21.150 INFO MultidimensionalModeller - Fitting allele-fraction model...
09:53:21.155 INFO ModelSegments - Shutting down engine
[April 19, 2022 at 9:53:21 AM SGT] org.broadinstitute.hellbender.tools.copynumber.ModelSegments done. Elapsed time: 0.45 minutes.
Runtime.totalMemory()=14323548160
java.lang.IllegalArgumentException: Metadata of the allelic counts and the segments do not match.
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:798)
at org.broadinstitute.hellbender.tools.copynumber.models.AlleleFractionModeller.<init>(AlleleFractionModeller.java:83)
at org.broadinstitute.hellbender.tools.copynumber.models.MultidimensionalModeller.fitModel(MultidimensionalModeller.java:107)
at org.broadinstitute.hellbender.tools.copynumber.models.MultidimensionalModeller.<init>(MultidimensionalModeller.java:90)
at org.broadinstitute.hellbender.tools.copynumber.ModelSegments.doWork(ModelSegments.java:571)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
-
Hi Tony Tan,
Could you provide the CollectAllelicCounts commands that you ran? I noticed this caveat in the tutorial:
For the matched-control analysis, the allelic count sites for the case and control must match exactly. Otherwise, ModelSegments, which takes the counts in the next step, will error.
Best,
Genevieve
-
Thanks Genevieve
i noticed it could be due to denoisedCR file may be too sparse as it was using another interval. Or must the CollectReadCount has to be done over the same interval?
.for CollectAllelicCounts, i used
$GATK SelectVariants -R hg38/gatk.hg38.fasta -V hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz --select-type-to-include SNP --selectExpressions "AF > 0.1" -restrict-alleles-to BIALLELIC -O hg38/CNVsnplist.v2.1kG.af0.1.hg38.vcf.gz
For both tumor and matched normal,
$GATK CollectAllelicCounts -L hg38/CNVsnplist.v2.1kG.af0.1.hg38.vcf.gz -I ${SAMPLE}.recalib.cram -R hg38/gatk.hg38.fasta -O ${SAMPLE}.allelicCounts.tsvi can run the ModelSegment without the denoisedCR or without the allelicCounts.tsv; but the one using allelicCounts only do not output any log2ratio for the segment.
another question, it seems i could get copynumber call for using the Part II of tutorial (from CollectAllelicCounts onwards) and skip all the panel of normal creation etc. In what context we should use the panel of normal? it is not clear to me the added benefit given that the panel of normal creation is quite complicated especially in the selection of samples to be included in panel creation.Thank you!
-
Hi Tony Tan,
I see, thank you so much for the follow up information. It seems that there may be some reference mismatch error with your denoised copy ratio counts and your allelic counts.
ModelSegments groups together copy and allelic ratios that it determines are contiguous on the same segment.
Could you take a closer look at the commands generating those files and verify that the references exactly match?
You can skip the panel of normal creation when you already have a panel of normals, but you need to build a panel of normals for your analysis. Here is an article describing this: https://gatk.broadinstitute.org/hc/en-us/articles/360035890631-Panel-of-Normals-PON-
Best,
Genevieve
Please sign in to leave a comment.
3 comments