DepthofCoverage GATK4.1.7.0 -gene-list
AnsweredDear all
I’m trying to use the DepthofCoverage tool with GATK 4.1.7.0 but I have a problem with the –gene-list option.
This is my command
/gatk4-4.1.7.0-0/gatk-package-4.1.7.0-local.jar DepthOfCoverage -R /path/ucsc.hg19.fasta -O /path/Cov -I /path/file.markdup.recal.sort.bam -L /path/bed/name.bed -gene-list /path/reference/provaRefseq.txt --summary-coverage-threshold 5 --summary-coverage-threshold 15 --summary-coverage-threshold 30 -pt readgroup --omit-depth-output-at-each-base
And this is the error: cannot read file:///path/reference/provaRefseq.txt because no suitable codecs found
Here the first reads of the Refseq used
cdsStart cdsEnd exonCount exonStarts exonEnds score name2 cdsStartStat cdsEndStat exonFrames
585 NR_148357 chr1 + 11868 14362 14362 14362 3 11868,12612,13220, 12227,12721,14362, 0 LOC102725121 unk unk -1,-1,-1,
585 NR_046018 chr1 + 11873 14409 14409 14409 3 11873,12612,13220, 12227,12721,14409, 0 DDX11L1 unk unk -1,-1,-1,
585 NR_024540 chr1 - 14361 29370 29370 29370 11 14361,14969,15795,16606,16857,17232,17605,17914,18267,24737,29320, 14829,15038,15947,16765,17055,17368,17742,18061,18366,24891,29370, 0 WASH7P unk unk -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
585 NR_106918 chr1 - 17368 17436 17436 17436 1 17368, 17436, 0 MIR6859-1 unk unk -1,
585 NR_107062 chr1 - 17368 17436 17436 17436 1 17368, 17436, 0 MIR6859-2 unk unk -1,
585 NR_107063 chr1 - 17368 17436 17436 17436 1 17368, 17436, 0 MIR6859-3 unk unk -1,
The same command without –gene-list option works well and I obtained 6 outputs:
- _summary
- _statistics
- _interval_summary
- _interval_statistics:
- _cumulative_coverage_counts:
- _cumulative_coverage_proportions:
I need a help in order to understand if there is a bug on the –gene-list option or which is the right way to have a usable RefSeq file. I also tried to use a different RefSeq (but I obtained the same error) or a bed file with the gene names at the fourth column (either in the –L or in the –gene-list option). Moreover, I tried to use a list of gene (.list file) but it cannot be read.
Can someone help me please? I would like to generate also the outputs _gene_summary and _gene_statistics
Thank you
-
Geraldine Van der Auwera how can I ask help to the team?
-
Hi marta r , you are doing the right thing by posting here. We ask for your patience and understanding as this is a free support resource staffed by one person right now. We will get to your question in the order it was received. I hope you understand!
-
Hi marta r - I took a quick look. I am no expert on this tool but the "no suitable codecs" message makes me believe something is wrong with your refseq file. Here is some documentation on this: https://gatk.broadinstitute.org/hc/en-us/articles/360035532032-RefSeq-gene-list-format
I will try to take a closer look tomorrow.
-
Tiffany Miller thank you very much for the help. I think that the problem could not be the RefSeq format: I followed the guideline that you mentioned and I also tried different RefSeq (not only form UCSC) with several features. This is why I thought to a bug in the new version (same RefSeq files work well with the GATK 3)
-
Hi marta r I am not able to replicate your error using gatk 4.1.7.0
Here are the commands I used and the way I set up the RefSeq file. We need to update the RefSeq doc b/c it isn't clear as to what track or table to use so I will put that on our to-do list. I think what is causing your issue is using .txt instead of .refseq. I renamed my refseq file to .txt and got a similar error as you.
gatk \
DepthOfCoverage \
-R ref/ref.fasta \
-O test_DOC -I bams/mother.bam \
-gene-list intervals/test_chr20_v2.refseq \
-L intervals/motherHighconf.bed
-
Tiffany Miller thank you very much for the help. Now it works, but I have still another question about the output
This was my command:
gatk DepthOfCoverage -R /path/ucsc.hg19.fasta -O coverage_name -I /input1.bam -I /input2.bam -I /input3.bam -gene-list /path/genome.refseq --summary-coverage-threshold 5 --summary-coverage-threshold 15 --summary-coverage-threshold 30 -L /path/mybed.bed
an these the outputs:
coverage_name
coverage_name.sample_cumulative_coverage_counts
coverage_name.sample_cumulative_coverage_proportions
coverage_name.sample_gene_summary
coverage_name.sample_interval_statistics
coverage_name.sample_interval_summary
coverage_name.sample_statistics
coverage_name.sample_summary
As you can see, I did't obtain the "_gene_statistics" output. Do you know why? Is there any option to add?
Thank you
-
Ah, another point: the sample_interval_summary file was empty (only the header is present)
-
Hi marta r, it looks like you are using GATK4.1.7.0, and we are actually on 4.1.8.0. The problem with the interval summary file was patched in that update. Would you be able to update your GATK and re-run this tool to see if there is still a problem?
-
Hi marta r, we are looking into the issue with the _gene_statistics output. Could you please try running the command with only one bam file and see if the output is there? The most common usage is with one input file so that may be where the issue is coming from.
-
Hi, we tried with only one BAM file but the output files didn't changed. I will try with the new version GATK 4.1.8.0
Thank you
-
Hi marta r, thank you for testing this. We have created a github ticket to fix this, and you can follow along with the issue here: https://github.com/broadinstitute/gatk/issues/6714
-
Hi marta r, did you get the correct output with the _gene_statistics file when updating to 4.1.8.0?
-
sorry for the big delay! Yes it works fine, thank you!
-
Hello,
Is it possible to get transcript numbers of the genes in the sample_gene_summary file?Thanks
-
Hi Sinem Selvi,
The feature you are describing is currently possible but you would need to do some workarounds to make it happen. You could provide a file with a transcript list from the -gene-list argument instead of the normal input gene list. If you can generate that input file in the correct RefSeq format, then the tool should work for you.
If you want to submit an official feature request to make this a larger part of the tool, please make a new post in the "General Discussion" topic. If many other users also want this feature and interact with the post then we can prioritize adding it.
Best,
Genevieve
Please sign in to leave a comment.
15 comments