CalculateContamination "there is no such column: contig"
I am trying to run the "Somatic short variant discovery" pipeline and I have followed the GATK best practices for data preprocessing. I am working through this guide on the whole pipeline, and this guide on mutect2. I was able to call the preliminary mutation using Mutect2 and create pileup summaries from the resulting vcf files.
I keep getting an error when trying to calculate contamination after doing pileup summaries. I got no errors from the pileup command, but I keep getting "there is no such column: contig" when I try to actually calculate contamination.
Currently using GATK 4.1.8.0.
The exact command that I used was
./req-files/gatk/gatk CalculateContamination -I ./09-PileupSummary/LP1_DSMZ_p9_CL_Whole_T1_A1S4U_K02371_D0RWMACXX_CAGATC.pileup.table -O ./10-CalculateContamination/LP1_DSMZ_p9_CL_Whole_T1_A1S4U_K02371_D0RWMACXX_CAGATC.contamination.table 1>&2 2>./10-CalculateContamination/LP1_DSMZ_p9_CL_Whole_T1_A1S4U_K02371_D0RWMACXX_CAGATC.contamination.gatk.log
The entire log file is:
Using GATK jar /scratch/07467/jwr2735/somaticPipeline/req-files/gatk/gatk-package-4.1.8.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /scratch/07467/jwr2735/somaticPipeline/req-files/gatk/gatk-package-4.1.8.0-local.jar CalculateContamination -I ./09-PileupSummary/LP1_DSMZ_p9_CL_Whole_T1_A1S4U_K02371_D0RWMACXX_CAGATC.pileup.table -O ./10-CalculateContamination/LP1_DSMZ_p9_CL_Whole_T1_A1S4U_K02371_D0RWMACXX_CAGATC.contamination.table
12:50:33.979 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/scratch/07467/jwr2735/somaticPipeline/req-files/gatk/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jul 05, 2020 12:50:34 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
12:50:34.131 INFO CalculateContamination - ------------------------------------------------------------
12:50:34.131 INFO CalculateContamination - The Genome Analysis Toolkit (GATK) v4.1.8.0
12:50:34.131 INFO CalculateContamination - For support and documentation go to https://software.broadinstitute.org/gatk/
12:50:34.132 INFO CalculateContamination - Executing as jwr2735@nid00019 on Linux v4.4.103-6.38_4.0.95-cray_ari_c amd64
12:50:34.132 INFO CalculateContamination - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_151-b12
12:50:34.132 INFO CalculateContamination - Start Date/Time: July 5, 2020 12:50:33 PM CDT
12:50:34.132 INFO CalculateContamination - ------------------------------------------------------------
12:50:34.132 INFO CalculateContamination - ------------------------------------------------------------
12:50:34.132 INFO CalculateContamination - HTSJDK Version: 2.22.0
12:50:34.132 INFO CalculateContamination - Picard Version: 2.22.8
12:50:34.132 INFO CalculateContamination - HTSJDK Defaults.COMPRESSION_LEVEL : 2
12:50:34.133 INFO CalculateContamination - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:50:34.133 INFO CalculateContamination - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
12:50:34.133 INFO CalculateContamination - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:50:34.133 INFO CalculateContamination - Deflater: IntelDeflater
12:50:34.133 INFO CalculateContamination - Inflater: IntelInflater
12:50:34.133 INFO CalculateContamination - GCS max retries/reopens: 20
12:50:34.133 INFO CalculateContamination - Requester pays: disabled
12:50:34.133 INFO CalculateContamination - Initializing engine
12:50:34.133 INFO CalculateContamination - Done initializing engine
12:50:34.153 INFO CalculateContamination - Shutting down engine
[July 5, 2020 12:50:34 PM CDT] org.broadinstitute.hellbender.tools.walkers.contamination.CalculateContamination done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=1124073472
java.lang.IllegalArgumentException: there is no such column: contig
at org.broadinstitute.hellbender.utils.tsv.DataLine.columnIndex(DataLine.java:458)
at org.broadinstitute.hellbender.utils.tsv.DataLine.get(DataLine.java:427)
at org.broadinstitute.hellbender.utils.tsv.DataLine.get(DataLine.java:556)
at org.broadinstitute.hellbender.tools.walkers.contamination.PileupSummary$PileupSummaryTableReader.createRecord(PileupSummary.java:193)
at org.broadinstitute.hellbender.tools.walkers.contamination.PileupSummary$PileupSummaryTableReader.createRecord(PileupSummary.java:188)
at org.broadinstitute.hellbender.utils.tsv.TableReader.fetchNextRecord(TableReader.java:364)
at org.broadinstitute.hellbender.utils.tsv.TableReader.access$200(TableReader.java:99)
at org.broadinstitute.hellbender.utils.tsv.TableReader$1.hasNext(TableReader.java:472)
at java.util.Iterator.forEachRemaining(Iterator.java:115)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.broadinstitute.hellbender.utils.tsv.TableReader.toList(TableReader.java:532)
at org.broadinstitute.hellbender.tools.walkers.contamination.PileupSummary.readFromFile(PileupSummary.java:139)
at org.broadinstitute.hellbender.tools.walkers.contamination.CalculateContamination.doWork(CalculateContamination.java:116)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
at org.broadinstitute.hellbender.Main.main(Main.java:292)
-
Hi Jensen Richardson, what was the GetPileupSummaries command that you used? Here is the documentation. In that document, there is an example of the expected output, which is a table of 6 columns, starting with contig. Is that file as expected? If so, could you send us the command you used and the first 5 lines of that file?
-
Hi Genevieve,
The GetPileupSummaries (or rather just Pileup) command that I used was:
./req-files/gatk/gatk Pileup -R ./req-files/refGenome/GRCh38.p7.genome.fa -I ./07-ApplyBaseRecalibration/LP1_DSMZ_p9_CL_Whole_T1_A1S4U_K02371_D0RWMACXX_CAGATC.recalibrated.bam -O ./09-PileupSummary/LP1_DSMZ_p9_CL_Whole_T1_A1S4U_K02371_D0RWMACXX_CAGATC.pileup.table 1>&2 2>./09-PileupSummary/LP1_DSMZ_p9_CL_Whole_T1_A1S4U_K02371_D0RWMACXX_CAGATC.pileup.gatk.log
I see that instead of using GetPileupSummaries I actually used Pileup. I will rerun it using the correct tool (that might help...) While investigating how I may have made this mistake I did notice that on the main gatk tool documentation index that the link for GetPileupSummaries actually links to Pileup, not the GetPileupSummaries documentation page.
-
Hi Jensen Richardson, yes, I think it will work if you use GetPileupSummaries because they produce different outputs. And thank you for pointing out that link problem, we will get that fixed. Let me know if using GetPileupSummaries works!
-
I am now trying to use GetPileupSummaries, but I keep getting the same error:
A USER ERROR has occurred: Badly formed genome unclippedLoc: Contig 1 given as location, but this contig isn't present in the Fasta sequence dictionary
From what I've read online it seems like this often comes from not correctly matching the reference genome across all steps of the pipeline, but I have been very careful to do that.
I am using dbSNP common variants from here, and I am using build 151 of dbSNP. It says that it is for GRCh38.p7 in the header of the file, but I am still getting this problem even though I have been using GRCh38.p7 for the whole pipeline. I cannot fathom why it would be unable to find the contig if I have been using the same reference for the whole time. My current theory is it comes from the VCF file being formatted like this (I removed most of the info column because it doesn't seem necessary):
#CHROM POS ID REF ALT QUAL FILTER INFO
1 10177 rs367896724 A AC . . RS=367896724;RSPOS=10177;dbSNPBuildID=138
1 10352 rs555500075 T TA . . RS=555500075;RSPOS=10352;dbSNPBuildID=142
1 10616 rs376342519 CCGCCGTTGCAAAGGCGCGCCG C . . RS=376342519;RSPOS=10617;dbSNPBuildID=142
And you can see that the CHROM position contains just a "1" instead of a "chr1" as is common in GRCh38. Do you have any ideas for what could be causing this?
-
So doing something unrelated I stumbled upon this issue on the github which seems to relate to nearly the exact same thing. It seems as if dbSNP is not using the chr prefix, even though that seems to be thought (at least by Broad) as the standard for GRCh38.
-
Hi Jensen Richardson thank you for the update. Here the link to the issue you pointed out with the broken hyperlinks, so you can follow along: https://github.com/broadinstitute/gatk/issues/6699. Could you post your entire GetPileupSummaries command?
-
Thank you for the link to the issue. Here is the GetPileupSummaries command that I was using:
./req-files/gatk/gatk GetPileupSummaries \
-I ./07-ApplyBaseRecalibration/LP1_DSMZ_p9_CL_Whole_T1_A1S4U_K02371_D0RWMACXX_CAGATC.recalibrated.bam \
-V ./req-files/common_var/00-common_all.vcf \
-L ./req-files/common_var/00-common_all.vcf \
-O ./09-PileupSummary/LP1_DSMZ_p9_CL_Whole_T1_A1S4U_K02371_D0RWMACXX_CAGATC.pileup.table \
1>&2 2>./09-PileupSummary/LP1_DSMZ_p9_CL_Whole_T1_A1S4U_K02371_D0RWMACXX_CAGATC.pileup.gatk.log
-
Thank you for the quick response. I confirmed with our developers that unfortunately this is an issue with dbSNP and is not on our end. As you linked to before, we are working on a fix that will be able to be used with funcotator, so you can stay tuned to this. However, you may have to find a workaround with your data to get dbSNP to match your reference. Just be careful with the alternative contig names and that you are keeping everything correct.
Please sign in to leave a comment.
8 comments