CalculateContamination: Multiple KernelSegmenter warnings
Dear GATK Team,
When running CalculateContamination, I am receiving multiple warnings and the tool exits after 0.01 minutes.
Specific details:
GATK version: 4.1.9.0.
Command used:
gatk CalculateContamination \
--input sample.getpileupsummaries.table \
--tumor-segmentation sample.segments.table
--output sample.contamination.table \
--tmp-dir $TMPDIR
Error log:
01:49:35.713 INFO CalculateContamination - ------------------------------------------------------------
01:49:35.713 INFO CalculateContamination - The Genome Analysis Toolkit (GATK) v4.1.9.0
01:49:35.714 INFO CalculateContamination - For support and documentation go to https://software.broadinstitute.org/gatk/
01:49:35.714 INFO CalculateContamination - Executing as … on Linux v3.10.0-1127.el7.x86_64 amd64
01:49:35.714 INFO CalculateContamination - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_92-b14
01:49:35.714 INFO CalculateContamination - Start Date/Time: 22 November 2020 01:49:35 GMT
01:49:35.714 INFO CalculateContamination - ------------------------------------------------------------
01:49:35.714 INFO CalculateContamination - ------------------------------------------------------------
01:49:35.714 INFO CalculateContamination - HTSJDK Version: 2.23.0
01:49:35.714 INFO CalculateContamination - Picard Version: 2.23.3
01:49:35.714 INFO CalculateContamination - HTSJDK Defaults.COMPRESSION_LEVEL : 2
01:49:35.715 INFO CalculateContamination - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
01:49:35.715 INFO CalculateContamination - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
01:49:35.715 INFO CalculateContamination - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
01:49:35.715 INFO CalculateContamination - Deflater: IntelDeflater
01:49:35.715 INFO CalculateContamination - Inflater: IntelInflater
01:49:35.715 INFO CalculateContamination - GCS max retries/reopens: 20
01:49:35.715 INFO CalculateContamination - Requester pays: disabled
01:49:35.715 INFO CalculateContamination - Initializing engine
01:49:35.715 INFO CalculateContamination - Done initializing engine
01:49:35.861 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (3) to segment; using all data points to calculate kernel matrix.
01:49:35.887 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (3). Local changepoint costs will not be calculated for this window size.
01:49:35.887 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.896 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
01:49:35.899 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (1) to segment; using all data points to calculate kernel matrix.
01:49:35.900 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (1). Local changepoint costs will not be calculated for this window size.
01:49:35.900 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.900 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
01:49:35.900 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (1) to segment; using all data points to calculate kernel matrix.
01:49:35.900 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (1). Local changepoint costs will not be calculated for this window size.
01:49:35.900 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.901 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
01:49:35.901 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (3) to segment; using all data points to calculate kernel matrix.
01:49:35.901 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (3). Local changepoint costs will not be calculated for this window size.
01:49:35.901 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.902 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
01:49:35.902 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (2) to segment; using all data points to calculate kernel matrix.
01:49:35.902 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (2). Local changepoint costs will not be calculated for this window size.
01:49:35.902 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.902 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
01:49:35.903 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (2) to segment; using all data points to calculate kernel matrix.
01:49:35.903 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (2). Local changepoint costs will not be calculated for this window size.
01:49:35.903 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.903 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
01:49:35.904 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (3) to segment; using all data points to calculate kernel matrix.
01:49:35.904 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (3). Local changepoint costs will not be calculated for this window size.
01:49:35.904 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.904 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
01:49:35.905 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (1) to segment; using all data points to calculate kernel matrix.
01:49:35.905 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (1). Local changepoint costs will not be calculated for this window size.
01:49:35.905 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.906 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
01:49:35.906 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (2) to segment; using all data points to calculate kernel matrix.
01:49:35.907 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (2). Local changepoint costs will not be calculated for this window size.
01:49:35.907 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.910 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
01:49:35.911 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (2) to segment; using all data points to calculate kernel matrix.
01:49:35.911 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (2). Local changepoint costs will not be calculated for this window size.
01:49:35.911 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.911 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
01:49:35.911 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (2) to segment; using all data points to calculate kernel matrix.
01:49:35.912 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (2). Local changepoint costs will not be calculated for this window size.
01:49:35.912 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.912 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
01:49:35.912 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (1) to segment; using all data points to calculate kernel matrix.
01:49:35.913 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (1). Local changepoint costs will not be calculated for this window size.
01:49:35.913 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.913 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
01:49:35.913 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (1) to segment; using all data points to calculate kernel matrix.
01:49:35.913 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (1). Local changepoint costs will not be calculated for this window size.
01:49:35.913 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.914 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
01:49:35.914 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (1) to segment; using all data points to calculate kernel matrix.
01:49:35.914 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (1). Local changepoint costs will not be calculated for this window size.
01:49:35.914 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.914 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
01:49:35.914 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (4) to segment; using all data points to calculate kernel matrix.
01:49:35.916 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (4). Local changepoint costs will not be calculated for this window size.
01:49:35.916 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.916 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
01:49:35.916 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (1) to segment; using all data points to calculate kernel matrix.
01:49:35.917 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (1). Local changepoint costs will not be calculated for this window size.
01:49:35.917 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.917 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
01:49:35.917 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (17) to segment; using all data points to calculate kernel matrix.
01:49:35.929 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (17). Local changepoint costs will not be calculated for this window size.
01:49:35.929 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.930 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
01:49:35.930 WARN KernelSegmenter - Specified dimension of the kernel approximation (100) exceeds the number of data points (6) to segment; using all data points to calculate kernel matrix.
01:49:35.931 WARN KernelSegmenter - Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (6). Local changepoint costs will not be calculated for this window size.
01:49:35.931 WARN KernelSegmenter - No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.931 INFO KernelSegmenter - Found 0 changepoints after applying the changepoint penalty.
01:49:36.120 INFO CalculateContamination - Shutting down engine
[22 November 2020 01:49:36 GMT] org.broadinstitute.hellbender.tools.walkers.contamination.CalculateContamination done. Elapsed time: 0.01 minutes.
How do I address these warnings and solve the issue? In addition, are the output segments.table and contamination.table valid, considering these warnings are present?
Thank you for your time and help.
Kind regards.
-
ISmolicz these warnings indicate that you do not have enough data points. What about your output file? Does it make sense?
-
Thank you for your reply Genevieve Brandt.
The output files have data. However, I do not know whether the files created by this tool are valid if so many warnings are arising when running CalculateContamination?
This step is preceded by GetPileupSummaries as part of the (How to) Call somatic mutations using GATK4 Mutect2 workflow, where I specify --variant (common_biallelic.vcf) and --intervals (intervals.interval_list). Therefore, is the issue of not having enough data points unavoidable when applying these arguments?
In addition, the number of contigs listed in the segments.table differ per sample - is this expected?
Thank you again.
-
Hi ISmolicz, in this document you linked to ((How to) Call somatic mutations using GATK4 Mutect2), the -L in GetPileupSummaries is the known variant sites VCF, it looks like you are using an intervals list. Please re-try with the known sites like the Best Practices and see if it works.
-
Thank you for your reply Genevieve Brandt.
I am using an intervals file following a post published a couple of weeks ago (please see link below). Brian Haas advised that an intervals file could be used if available and if using whole exome data. I am doing targeted sequencing - would using an intervals file not be recommended if available?
-
ISmolicz since you are having an issue of not having enough data, try following how it is done in the ((How to) Call somatic mutations using GATK4 Mutect2) by using the common sites as -L and see if that solves your issue.
-
Unfortunately using the same common_biallelic.vcf file for both -V and -L does not solve the issue. In fact, there are less bp from intervals and total loci processed using the common_biallelic.vcf compared to my interval list.
Is there anything else you would recommend trying Genevieve Brandt? Your advice would be appreciated.
-
ISmolicz what species are you researching? Could you give me more information about your data?
-
I am working with human data. Samples underwent paired-end sequencing following library prep, which included targeted gene enrichment against a gene panel.
-
ISmolicz could you upload a bug report following these instructions? Please be sure to put all files in one folder and name it something specific. These are the files we need:
- Complete stack trace in a text file (use the option -DSTACK_TRACE_ON_USEREXCEPTION)
- All tables used (pileup summaries and segments)
- Variant and interval files
- Reference file or specific link to find the reference
-
Genevieve-Brandt-she-her I am receiving same warning message When running CalculateContamination. I have followed the "(How to) Call somatic mutations using GATK4 Mutect2"
Specific details:
GATK version: 4.1.9.0
-
Hi Aqsa Majeed,
This is a warning indicating that you could get better results if you ran with more data.
You can see more explanation at this discussion on our legacy forum site: https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/2018-08-10-2018-04-11/11959-GATK4s-CalculateContamination-reports-no-hom-alt-sites-found
Genevieve
-
Hi ISmolicz,
Were you able to solve this issue?
Best,
Deya
-
Deya Alzoubi, Not yet.
-
Apologies for the delay in replying but unfortunately I cannot include all the files requested in a bug report and therefore, have not submitted this. Would it be acceptable to submit a report with some of the files? I completely understand if not.
I note in your response to Aqsa Majeed that you mention "This is a warning indicating that you could get better results if you ran with more data." Therefore, considering this, are the contamination.table and segments.table generated with CalculateContamination still useable with the warnings observed?
Thank you for your time and help.
-
Hi Deya Alzoubi,
Unfortunately I have not been able to solve the problem yet but will await to hear from Genevieve Brandt following my most recent post.
Kind regards.
-
The the contamination.table and segments.table are usable. In the event of an error making them unusable the tool will throw an error and/or it will not generate an output. You don't need to worry about warnings. They do not effect your analysis or results. It's just giving you more information on what's going on under the hood.
-
Thank you for your reply Bhanu Gandham.
Please sign in to leave a comment.
17 comments