DepthOfCoverage Error: Refseq file contains transcripts with zero coding length
Hi,
I am trying to generate coverage per gene statistics but keep getting an error. I have prepare the refseq file according to Refseq gene list. I have got 8 output files. _gene_summary file was generated but contains header with gene names/no statistics and _gene_statistics file was not generated. The other looks fine. Could you help me figure out what is the problem please?
Thank you!!
GATK v4.1.9.0
command used:
gatk DepthOfCoverage \
-R Mus_musculus.GRCm38.dna_sm.primary_assembly.fa \
-O 153DoC \
-I 153.bam \
-gene-list genetrack.refseq \
-L Covered.bed
Error:
17:15:31.661 INFO DepthOfCoverage - ------------------------------------------------------------
17:15:31.662 INFO DepthOfCoverage - The Genome Analysis Toolkit (GATK) v4.1.9.0
17:15:31.662 INFO DepthOfCoverage - For support and documentation go to https://software.broadinstitute.org/gatk/
17:15:31.664 INFO DepthOfCoverage - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_265-b11
17:15:31.664 INFO DepthOfCoverage - Start Date/Time: February 4, 2021 5:15:29 PM AEST
17:15:31.664 INFO DepthOfCoverage - ------------------------------------------------------------
17:15:31.664 INFO DepthOfCoverage - ------------------------------------------------------------
17:15:31.665 INFO DepthOfCoverage - HTSJDK Version: 2.23.0
17:15:31.665 INFO DepthOfCoverage - Picard Version: 2.23.3
17:15:31.665 INFO DepthOfCoverage - HTSJDK Defaults.COMPRESSION_LEVEL : 2
17:15:31.665 INFO DepthOfCoverage - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
17:15:31.665 INFO DepthOfCoverage - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
17:15:31.665 INFO DepthOfCoverage - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
17:15:31.666 INFO DepthOfCoverage - Deflater: IntelDeflater
17:15:31.666 INFO DepthOfCoverage - Inflater: IntelInflater
17:15:31.666 INFO DepthOfCoverage - GCS max retries/reopens: 20
17:15:31.666 INFO DepthOfCoverage - Requester pays: disabled
17:15:31.666 WARN DepthOfCoverage -
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Warning: DepthOfCoverage is a BETA tool and is not yet ready for use in production
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
17:15:31.666 INFO DepthOfCoverage - Initializing engine
17:15:32.418 INFO FeatureManager - Using codec BEDCodec to read file file:///Covered.bed
17:15:32.487 INFO IntervalArgumentCollection - Processing 485336 bp from intervals
17:15:32.497 INFO DepthOfCoverage - Done initializing engine
17:15:32.572 INFO ProgressMeter - Starting traversal
17:15:32.572 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
17:15:33.074 INFO FeatureManager - Using codec BEDCodec to read file file:///Covered.bed
17:15:33.128 INFO FeatureManager - Using codec RefSeqCodec to read file file:///genetrack.refseq
17:15:33.131 WARN Utils - **********************************************************************
17:15:33.132 WARN Utils - * WARNING:
17:15:33.132 WARN Utils - *
17:15:33.132 WARN Utils - * RefSeq file contains transcripts with zero coding length. Such
17:15:33.132 WARN Utils - * transcripts will be ignored (this warning is printed only once)
17:15:33.132 WARN Utils - **********************************************************************
17:15:42.654 INFO ProgressMeter - 1:86070579 0.2 6000 35707.2
17:15:58.103 INFO ProgressMeter - 1:88218427 0.4 11000 25850.9
17:16:09.290 INFO ProgressMeter - 1:169987087 0.6 19000 31047.4
17:16:19.834 INFO ProgressMeter - 10:26986836 0.8 26000 33007.5
17:16:30.637 INFO ProgressMeter - 10:27350407 1.0 35000 36167.0
17:16:41.546 INFO ProgressMeter - 10:52194843 1.1
# here is the first few lines of the refseq file
#bin name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds score name2 cdsStartStat cdsEndStat exonFrames
1082 NM_001111320 chr1 - 65158615 65186479 65159464 65175352 9 65158615,65161049,65161799,65165099,65166106,65168498,65170958,65175230,65186304, 65159555,65161212,65161940,65165251,65166284,65168604,65171250,65175368,65186479, 0 Idh1 cmpl cmpl 2,1,1,2,1,0,2,0,-1,
222 NM_001136104 chr1 + 156558786 156649619 156559048 156642716 13 156558786,156620838,156622484,156625287,156629852,156631344,156633079,156633752,156635064,156636817,156640438,156640992,156641541, 156559205,156620901,156622655,156625583,156630125,156631429,156633257,156633937,156635217,156636907,156640612,156641229,156649619, 0 Abl2 cmpl cmpl 0,1,1,1,0,0,1,2,1,1,1,1,1,
1863 NM_001159731 chr1 + 167618264 167639623 167624999 167639273 9 167618264,167624927,167627291,167630961,167631229,167632514,167634434,167635699,167639125, 167618495,167625072,167627471,167631122,167631359,167632647,167634526,167635805,167639623, 0 Rxrg cmpl cmpl -1,0,1,1,0,1,2,1,2,
1591 NM_001177628 chr1 + 131962914 131982972 131976940 131981728 5 131962914,131976717,131977412,131978747,131981290,
-
Hi ashgorden,
The gene statistics file issue is on our radar and we have already made changes to fix the issue. Please see this link. We are going to be releasing a new GATK version in the next week or so with this change contained. If you do not want to wait, you can download the nightly release docker container.
Once you try it out, let us know if that fixes the issue.
Thanks,
Genevieve
-
Hi Genevieve Brandt (she/her),
Thank you for the reply. Will definitely let you know after trying it on the new version.
Looking forward to the new release!
Ash
Please sign in to leave a comment.
2 comments