CalculateContamination: Multiple KernelSegmenter warnings
Dear GATK Team,
When running CalculateContamination, I am receiving multiple warnings and the tool exits after 0.01 minutes.
Specific details:
GATK version: 4.1.9.0.
Command used:
gatk CalculateContamination \
input sample.getpileupsummaries.table \
tumorsegmentation sample.segments.table
output sample.contamination.table \
tmpdir $TMPDIR
Error log:
01:49:35.713 INFO CalculateContamination  
01:49:35.713 INFO CalculateContamination  The Genome Analysis Toolkit (GATK) v4.1.9.0
01:49:35.714 INFO CalculateContamination  For support and documentation go to https://software.broadinstitute.org/gatk/
01:49:35.714 INFO CalculateContamination  Executing as … on Linux v3.10.01127.el7.x86_64 amd64
01:49:35.714 INFO CalculateContamination  Java runtime: Java HotSpot(TM) 64Bit Server VM v1.8.0_92b14
01:49:35.714 INFO CalculateContamination  Start Date/Time: 22 November 2020 01:49:35 GMT
01:49:35.714 INFO CalculateContamination  
01:49:35.714 INFO CalculateContamination  
01:49:35.714 INFO CalculateContamination  HTSJDK Version: 2.23.0
01:49:35.714 INFO CalculateContamination  Picard Version: 2.23.3
01:49:35.714 INFO CalculateContamination  HTSJDK Defaults.COMPRESSION_LEVEL : 2
01:49:35.715 INFO CalculateContamination  HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
01:49:35.715 INFO CalculateContamination  HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
01:49:35.715 INFO CalculateContamination  HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
01:49:35.715 INFO CalculateContamination  Deflater: IntelDeflater
01:49:35.715 INFO CalculateContamination  Inflater: IntelInflater
01:49:35.715 INFO CalculateContamination  GCS max retries/reopens: 20
01:49:35.715 INFO CalculateContamination  Requester pays: disabled
01:49:35.715 INFO CalculateContamination  Initializing engine
01:49:35.715 INFO CalculateContamination  Done initializing engine
01:49:35.861 WARN KernelSegmenter  Specified dimension of the kernel approximation (100) exceeds the number of data points (3) to segment; using all data points to calculate kernel matrix.
01:49:35.887 WARN KernelSegmenter  Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (3). Local changepoint costs will not be calculated for this window size.
01:49:35.887 WARN KernelSegmenter  No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.896 INFO KernelSegmenter  Found 0 changepoints after applying the changepoint penalty.
01:49:35.899 WARN KernelSegmenter  Specified dimension of the kernel approximation (100) exceeds the number of data points (1) to segment; using all data points to calculate kernel matrix.
01:49:35.900 WARN KernelSegmenter  Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (1). Local changepoint costs will not be calculated for this window size.
01:49:35.900 WARN KernelSegmenter  No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.900 INFO KernelSegmenter  Found 0 changepoints after applying the changepoint penalty.
01:49:35.900 WARN KernelSegmenter  Specified dimension of the kernel approximation (100) exceeds the number of data points (1) to segment; using all data points to calculate kernel matrix.
01:49:35.900 WARN KernelSegmenter  Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (1). Local changepoint costs will not be calculated for this window size.
01:49:35.900 WARN KernelSegmenter  No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.901 INFO KernelSegmenter  Found 0 changepoints after applying the changepoint penalty.
01:49:35.901 WARN KernelSegmenter  Specified dimension of the kernel approximation (100) exceeds the number of data points (3) to segment; using all data points to calculate kernel matrix.
01:49:35.901 WARN KernelSegmenter  Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (3). Local changepoint costs will not be calculated for this window size.
01:49:35.901 WARN KernelSegmenter  No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.902 INFO KernelSegmenter  Found 0 changepoints after applying the changepoint penalty.
01:49:35.902 WARN KernelSegmenter  Specified dimension of the kernel approximation (100) exceeds the number of data points (2) to segment; using all data points to calculate kernel matrix.
01:49:35.902 WARN KernelSegmenter  Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (2). Local changepoint costs will not be calculated for this window size.
01:49:35.902 WARN KernelSegmenter  No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.902 INFO KernelSegmenter  Found 0 changepoints after applying the changepoint penalty.
01:49:35.903 WARN KernelSegmenter  Specified dimension of the kernel approximation (100) exceeds the number of data points (2) to segment; using all data points to calculate kernel matrix.
01:49:35.903 WARN KernelSegmenter  Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (2). Local changepoint costs will not be calculated for this window size.
01:49:35.903 WARN KernelSegmenter  No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.903 INFO KernelSegmenter  Found 0 changepoints after applying the changepoint penalty.
01:49:35.904 WARN KernelSegmenter  Specified dimension of the kernel approximation (100) exceeds the number of data points (3) to segment; using all data points to calculate kernel matrix.
01:49:35.904 WARN KernelSegmenter  Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (3). Local changepoint costs will not be calculated for this window size.
01:49:35.904 WARN KernelSegmenter  No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.904 INFO KernelSegmenter  Found 0 changepoints after applying the changepoint penalty.
01:49:35.905 WARN KernelSegmenter  Specified dimension of the kernel approximation (100) exceeds the number of data points (1) to segment; using all data points to calculate kernel matrix.
01:49:35.905 WARN KernelSegmenter  Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (1). Local changepoint costs will not be calculated for this window size.
01:49:35.905 WARN KernelSegmenter  No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.906 INFO KernelSegmenter  Found 0 changepoints after applying the changepoint penalty.
01:49:35.906 WARN KernelSegmenter  Specified dimension of the kernel approximation (100) exceeds the number of data points (2) to segment; using all data points to calculate kernel matrix.
01:49:35.907 WARN KernelSegmenter  Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (2). Local changepoint costs will not be calculated for this window size.
01:49:35.907 WARN KernelSegmenter  No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.910 INFO KernelSegmenter  Found 0 changepoints after applying the changepoint penalty.
01:49:35.911 WARN KernelSegmenter  Specified dimension of the kernel approximation (100) exceeds the number of data points (2) to segment; using all data points to calculate kernel matrix.
01:49:35.911 WARN KernelSegmenter  Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (2). Local changepoint costs will not be calculated for this window size.
01:49:35.911 WARN KernelSegmenter  No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.911 INFO KernelSegmenter  Found 0 changepoints after applying the changepoint penalty.
01:49:35.911 WARN KernelSegmenter  Specified dimension of the kernel approximation (100) exceeds the number of data points (2) to segment; using all data points to calculate kernel matrix.
01:49:35.912 WARN KernelSegmenter  Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (2). Local changepoint costs will not be calculated for this window size.
01:49:35.912 WARN KernelSegmenter  No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.912 INFO KernelSegmenter  Found 0 changepoints after applying the changepoint penalty.
01:49:35.912 WARN KernelSegmenter  Specified dimension of the kernel approximation (100) exceeds the number of data points (1) to segment; using all data points to calculate kernel matrix.
01:49:35.913 WARN KernelSegmenter  Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (1). Local changepoint costs will not be calculated for this window size.
01:49:35.913 WARN KernelSegmenter  No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.913 INFO KernelSegmenter  Found 0 changepoints after applying the changepoint penalty.
01:49:35.913 WARN KernelSegmenter  Specified dimension of the kernel approximation (100) exceeds the number of data points (1) to segment; using all data points to calculate kernel matrix.
01:49:35.913 WARN KernelSegmenter  Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (1). Local changepoint costs will not be calculated for this window size.
01:49:35.913 WARN KernelSegmenter  No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.914 INFO KernelSegmenter  Found 0 changepoints after applying the changepoint penalty.
01:49:35.914 WARN KernelSegmenter  Specified dimension of the kernel approximation (100) exceeds the number of data points (1) to segment; using all data points to calculate kernel matrix.
01:49:35.914 WARN KernelSegmenter  Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (1). Local changepoint costs will not be calculated for this window size.
01:49:35.914 WARN KernelSegmenter  No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.914 INFO KernelSegmenter  Found 0 changepoints after applying the changepoint penalty.
01:49:35.914 WARN KernelSegmenter  Specified dimension of the kernel approximation (100) exceeds the number of data points (4) to segment; using all data points to calculate kernel matrix.
01:49:35.916 WARN KernelSegmenter  Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (4). Local changepoint costs will not be calculated for this window size.
01:49:35.916 WARN KernelSegmenter  No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.916 INFO KernelSegmenter  Found 0 changepoints after applying the changepoint penalty.
01:49:35.916 WARN KernelSegmenter  Specified dimension of the kernel approximation (100) exceeds the number of data points (1) to segment; using all data points to calculate kernel matrix.
01:49:35.917 WARN KernelSegmenter  Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (1). Local changepoint costs will not be calculated for this window size.
01:49:35.917 WARN KernelSegmenter  No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.917 INFO KernelSegmenter  Found 0 changepoints after applying the changepoint penalty.
01:49:35.917 WARN KernelSegmenter  Specified dimension of the kernel approximation (100) exceeds the number of data points (17) to segment; using all data points to calculate kernel matrix.
01:49:35.929 WARN KernelSegmenter  Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (17). Local changepoint costs will not be calculated for this window size.
01:49:35.929 WARN KernelSegmenter  No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.930 INFO KernelSegmenter  Found 0 changepoints after applying the changepoint penalty.
01:49:35.930 WARN KernelSegmenter  Specified dimension of the kernel approximation (100) exceeds the number of data points (6) to segment; using all data points to calculate kernel matrix.
01:49:35.931 WARN KernelSegmenter  Number of points needed to calculate local changepoint costs (2 * window size = 100) exceeds number of data points (6). Local changepoint costs will not be calculated for this window size.
01:49:35.931 WARN KernelSegmenter  No changepoint candidates were found. The specified window sizes may be inappropriate, or there may be insufficient data points.
01:49:35.931 INFO KernelSegmenter  Found 0 changepoints after applying the changepoint penalty.
01:49:36.120 INFO CalculateContamination  Shutting down engine
[22 November 2020 01:49:36 GMT] org.broadinstitute.hellbender.tools.walkers.contamination.CalculateContamination done. Elapsed time: 0.01 minutes.
How do I address these warnings and solve the issue? In addition, are the output segments.table and contamination.table valid, considering these warnings are present?
Thank you for your time and help.
Kind regards.

ISmolicz these warnings indicate that you do not have enough data points. What about your output file? Does it make sense?

Thank you for your reply Genevieve Brandt.
The output files have data. However, I do not know whether the files created by this tool are valid if so many warnings are arising when running CalculateContamination?
This step is preceded by GetPileupSummaries as part of the (How to) Call somatic mutations using GATK4 Mutect2 workflow, where I specify variant (common_biallelic.vcf) and intervals (intervals.interval_list). Therefore, is the issue of not having enough data points unavoidable when applying these arguments?
In addition, the number of contigs listed in the segments.table differ per sample  is this expected?
Thank you again.

Hi ISmolicz, in this document you linked to ((How to) Call somatic mutations using GATK4 Mutect2), the L in GetPileupSummaries is the known variant sites VCF, it looks like you are using an intervals list. Please retry with the known sites like the Best Practices and see if it works.

Thank you for your reply Genevieve Brandt.
I am using an intervals file following a post published a couple of weeks ago (please see link below). Brian Haas advised that an intervals file could be used if available and if using whole exome data. I am doing targeted sequencing  would using an intervals file not be recommended if available?

ISmolicz since you are having an issue of not having enough data, try following how it is done in the ((How to) Call somatic mutations using GATK4 Mutect2) by using the common sites as L and see if that solves your issue.

Unfortunately using the same common_biallelic.vcf file for both V and L does not solve the issue. In fact, there are less bp from intervals and total loci processed using the common_biallelic.vcf compared to my interval list.
Is there anything else you would recommend trying Genevieve Brandt? Your advice would be appreciated.

ISmolicz what species are you researching? Could you give me more information about your data?

I am working with human data. Samples underwent pairedend sequencing following library prep, which included targeted gene enrichment against a gene panel.

ISmolicz could you upload a bug report following these instructions? Please be sure to put all files in one folder and name it something specific. These are the files we need:
 Complete stack trace in a text file (use the option DSTACK_TRACE_ON_USEREXCEPTION)
 All tables used (pileup summaries and segments)
 Variant and interval files
 Reference file or specific link to find the reference

Genevieve Brandt (she/her) I am receiving same warning message When running CalculateContamination. I have followed the "(How to) Call somatic mutations using GATK4 Mutect2"
Specific details:
GATK version: 4.1.9.0

Hi Aqsa Majeed,
This is a warning indicating that you could get better results if you ran with more data.
You can see more explanation at this discussion on our legacy forum site: https://sites.google.com/a/broadinstitute.org/legacygatkforumdiscussions/2018081020180411/11959GATK4sCalculateContaminationreportsnohomaltsitesfound
Genevieve

Hi ISmolicz,
Were you able to solve this issue?
Best,
Deya

Deya Alzoubi, Not yet.

Apologies for the delay in replying but unfortunately I cannot include all the files requested in a bug report and therefore, have not submitted this. Would it be acceptable to submit a report with some of the files? I completely understand if not.
I note in your response to Aqsa Majeed that you mention "This is a warning indicating that you could get better results if you ran with more data." Therefore, considering this, are the contamination.table and segments.table generated with CalculateContamination still useable with the warnings observed?
Thank you for your time and help.

Hi Deya Alzoubi,
Unfortunately I have not been able to solve the problem yet but will await to hear from Genevieve Brandt following my most recent post.
Kind regards.

The the contamination.table and segments.table are usable. In the event of an error making them unusable the tool will throw an error and/or it will not generate an output. You don't need to worry about warnings. They do not effect your analysis or results. It's just giving you more information on what's going on under the hood.

Thank you for your reply Bhanu Gandham.
Please sign in to leave a comment.
17 comments