PlotModeledSegments lead to java.lang.IllegalArgumentException related to inconsistency
Brief summary: When running non-human whole genome sequencing sample following somatic CNV pipeline (How to part II) Sensitively detect copy ratio alterations and allelic segments – GATK (broadinstitute.org), all the previous step results have been validated existing and normal including denoisedCR.tsv, allelicCounts.tsv, modelFinal.seg, reference_genome.dict. I am doing normal-tumor pair runs. I tried with different --minimum-contig-length values (100,1000,46709983), but they all lead to the error
java.lang.IllegalArgumentException: Number of allelic-count points in input modeled-segments file is inconsistent with that in input heterozygous allelic-counts file.
Can you please point me in the direction to check where the inconsistency was produced? And possible ideas to fix this error or other ways to visualize the segmentation and ration results?
Thank you very much!
Yuwei
REQUIRED for all errors and issues:
a) GATK version used: docker image spacecade7/tutorial_11682_11683:gatk4.0.1.1 gatk
b) Exact command used:
docker run -v ./workdir/:./workdir/ spacecade7/tutorial_11682_11683:gatk4.0.1.1 gatk PlotModeledSegments --denoised-copy-ratios ./out/DFG2.denoisedCR.tsv --allelic-counts ./out/DFG2.allelicCounts.tsv --segments ./out/DFG2.modelFinal.seg --sequence-dictionary ./RefGenome/dmel-all-chromosome-r6.39.dict --minimum-contig-length 46709983 --output ./out/plots --output-prefix DFG2\_clean
c) Entire program log:
21:16:47.856 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/build/install/gatk/lib/gkl-0.8.2.jar!/com/intel/gkl/native/libgkl_compression.so
21:16:47.950 INFO PlotModeledSegments - ------------------------------------------------------------
21:16:47.950 INFO PlotModeledSegments - The Genome Analysis Toolkit (GATK) v4.0.1.1
21:16:47.950 INFO PlotModeledSegments - For support and documentation go to https://software.broadinstitute.org/gatk/
21:16:47.950 INFO PlotModeledSegments - Executing as root@0cb82bd99967 on Linux v5.15.0-78-generic amd64
21:16:47.950 INFO PlotModeledSegments - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11
21:16:47.950 INFO PlotModeledSegments - Start Date/Time: June 25, 2024 9:16:47 PM UTC
21:16:47.950 INFO PlotModeledSegments - ------------------------------------------------------------
21:16:47.950 INFO PlotModeledSegments - ------------------------------------------------------------
21:16:47.950 INFO PlotModeledSegments - HTSJDK Version: 2.14.1
21:16:47.950 INFO PlotModeledSegments - Picard Version: 2.17.2
21:16:47.950 INFO PlotModeledSegments - HTSJDK Defaults.COMPRESSION_LEVEL : 1
21:16:47.951 INFO PlotModeledSegments - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
21:16:47.951 INFO PlotModeledSegments - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
21:16:47.951 INFO PlotModeledSegments - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
21:16:47.951 INFO PlotModeledSegments - Deflater: IntelDeflater
21:16:47.951 INFO PlotModeledSegments - Inflater: IntelInflater
21:16:47.951 INFO PlotModeledSegments - GCS max retries/reopens: 20
21:16:47.951 INFO PlotModeledSegments - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
21:16:47.951 INFO PlotModeledSegments - Initializing engine
21:16:47.951 INFO PlotModeledSegments - Done initializing engine
21:16:47.952 INFO PlotModeledSegments - Reading and validating input files...
21:20:24.411 INFO PlotModeledSegments - Shutting down engine
[June 25, 2024 9:20:24 PM UTC] org.broadinstitute.hellbender.tools.copynumber.plotting.PlotModeledSegments done. Elapsed time: 3.61 minutes.
Runtime.totalMemory()=25088229376
java.lang.IllegalArgumentException: Number of allelic-count points in input modeled-segments file is inconsistent with that in input heterozygous allelic-counts file.
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:681)
at org.broadinstitute.hellbender.tools.copynumber.plotting.PlotModeledSegments.validateNumPointsPerContig(PlotModeledSegments.java:257)
at org.broadinstitute.hellbender.tools.copynumber.plotting.PlotModeledSegments.doWork(PlotModeledSegments.java:184)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:153)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195)
at org.broadinstitute.hellbender.Main.main(Main.java:277)
Using GATK wrapper script /gatk/build/install/gatk/bin/gatk
Running:
docker run -v ./workdir/:./workdir/ spacecade7/tutorial_11682_11683:gatk4.0.1.1 gatk PlotModeledSegments --denoised-copy-ratios ./out/DFG2.denoisedCR.tsv --allelic-counts ./out/DFG2.allelicCounts.tsv --segments ./out/DFG2.modelFinal.seg --sequence-dictionary ./RefGenome/dmel-all-chromosome-r6.39.dict --minimum-contig-length 46709983 --output ./out/plots --output-prefix DFG2\_clean
-
Are you using the proper hets.tsv file produced after the ModelSegments step in this plotting step?
--allelic-counts <File> Input file containing allelic counts at heterozygous sites (.hets.tsv output of
ModelSegments). Default value: null.If not you need to replace that allelic.counts.tsv that you are using with that file.
Regards.
-
Hi Gökalp:
Thanks for your reply.
Yes, I am using the sample.allelicCount.tsv file created by CollectAllelicCounts
Here are the example entries of the DFG2.allelicCounts.tsvCONTIG POSITION REF_COUNT ALT_COUNT REF_NUCLEOTIDE ALT_NUCLEOTIDE
2L 4954 7 0 G N
2L 4955 8 0 C N
2L 4956 8 0 G N
2L 4957 9 0 T N
2L 4958 9 0 A N
2L 4959 8 0 T N
2L 4960 8 0 G N
2L 4961 8 0 C N
2L 4962 8 0 G N
Prior to this step, I used these commandsCollectAllelicCounts
Example entries were listed above.
ModelSegments
Example entries of DFG2.cr.seg
CONTIG START END NUM_POINTS_COPY_RATIO MEAN_LOG2_COPY_RATIO
2L 5001 1891000 1730 -0.033812
2L 1891001 1907000 8 1.171973
2L 1907001 3391000 1368 -0.007153
2L 3391001 3433000 41 0.914498
2L 3435001 23513712 16800 0.014652
2R 1001 5796000 2477 0.075654
2R 5796001 17813000 10998 -0.009842
2R 17813001 17841000 28 0.916708
2R 17841001 25286936 6823 -0.024294Example entries of DFG2.called.seg
CONTIG START END NUM_POINTS_COPY_RATIO MEAN_LOG2_COPY_RATIO CALL
2L 5001 1891000 1730 -0.033812 0
2L 1891001 1907000 8 1.171973 +
2L 1907001 3391000 1368 -0.007153 0
2L 3391001 3433000 41 0.914498 +
2L 3435001 23513712 16800 0.014652 0
2R 1001 5796000 2477 0.075654 0
2R 5796001 17813000 10998 -0.009842 0
2R 17813001 17841000 28 0.916708 +
2R 17841001 25286936 6823 -0.024294 0
Please give me more advice on checking things. Thanks a lot!
Yuwei -
Actually you need to use the allelic counts file created by ModelSegments tool not the one you generated using CollectAllelicCounts. ModelSegments tool filters allelic counts that are usable by the downstream models therefore model only contains those sites that are selected by ModelSegments tool. There must be an output file with name hets.tsv at the end. That file is the one you need to use when plotting modeled segments.
Regards.
-
Thank you very much!! That leads to a plot!
However, I am not sure how to get rid of the extra genome. For example, the PlotDenoisedCopyRatios leads to the result
These are the genomes I want to show.
The result of PlotModeledSegments
Are there some ways to get rid of the chromosomes beyond X?
Also to interpret this result, I observed consistency within each genome, and some segments were identified on each genome. What other conclusions can I draw from this kind of results?
Thanks a lot!
Yuwei -
If you wish not to have any other chromosomes beyond X you need to remove them from your analysis of read count collections and allelic counts collections.
Regards.
Please sign in to leave a comment.
5 comments