Interpretation of gene_summary output from DepthofCoverage
AnsweredHi,
When I used the gene list from RefSeq and ran the DepthofCoverage on a test bam file, I got the output as follows. I wonder why the same gene names are getting repeated (referring to multiple transcripts?). Is there any detailed documentation is available to know how the coverage is getting calculated?. It would be really helpful to understand this better. Gene,total_coverage,average_coverage,Sample_total_cvg,Sample_mean_cvg,Sample_granular_Q1,Sample_granular_median,Sample_granular_Q3,Sample
_%_above_15
OR4F5,302,0.33,302,0.33,1,1,2,0.0
OR4F16,151,0.16,151,0.16,1,1,1,0.0
OR4F29,151,0.16,151,0.16,1,1,1,0.0
SAMD11,56,0.02,56,0.02,1,1,1,0.0
SAMD11,56,0.02,56,0.02,1,1,1,0.0
SAMD11,56,0.02,56,0.02,1,1,1,0.0
-
Thanks for writing into the forum about this issue! Could you check in your RefSeq file to see if the SAMD11 gene is repeated? If you are using a public RefSeq file, let me know which it is so I can take a look.
Best,
Genevieve
-
Hi,
I'm having the same issue, and I'm using a refseq gene list generated by following this article - https://gatk.broadinstitute.org/hc/en-us/articles/360035532032-RefSeq-gene-list-format
So the refseq gene list actually has multiple record (transcripts) for genes, and even with multiple record, GATK DepthOfCoverage from GATK 3.7 generates aggregate gene level summary (one line per gene), but with DoC from latest GATK, now it's producing multiple record per gene -
Hi Seunghun Han,
Thanks for writing into the forum about this! I confirmed with the developer that this behavior is expected. GATK3 producing one aggregate number did not treat the different transcripts differently at all. In GATK4, we wanted to make sure that if there were overlapping genes or transcripts, they wouldn't get merged and they would get individually measured.
Would you be able to look through IGV and confirm that the numbers make sense for the transcripts on each line?
Best,
Genevieve
-
Hi Genevieve,
I checked IGV and gene summary records for multiple transcripts and it looks like the numbers make sense. I think I will just go ahead and modify the refseq gene list so that each gene has a single representative transcript to fix the above mentioned issue. However, I noticed another behavior which wasn't a problem in GATK 3.7. A few of genes have this weird symbol -�
in their "average_coverage", "sampleid_mean_cvg", and "sampleid_%_above_15" columns as shown in the screenshot attached. Looks like it's happening only with the genes with no coverage, but most of the other genes with no coverage have 0 instead of � in the same columns. Is this a known bug?
Best,
Seunghun -
Which file is this appearing in?
-
Both interval level and gene level summary outputs have this. The screenshot above was from gene level summary output
-
Seunghun Han, it's not clear to me whether this is an issue with the GATK output or with the method you are using to view the file. Could you share a screenshot of what this output looks like on the command line with the head command?
I also would like to see how it is different than what was in GATK3.
Thank you!
-
I don't think it has anything to do with the way I'm viewing the file.
This is how it looks when I opened on of the gene level summary on VIM.
I don't have outputs from GATK3 and GATK4 DoC runs on an identical files, so
can't really make a head to head comparison here, but I checked several DoC
output from GATK3 DoC runs, and didn't find � in them.
Also, a downstream tool I'm using takes these gene level and interval level summary
outputs from DoC as inputs, and the tool worked fine with GATK3 outputs, but now with
� symbols in the outputs from GATK4, the existence of � is affecting the data type of some of the columns where there are only supposed to be numbers, and the tool is now failing. -
Okay I see, thanks for sharing these updates! What version of GATK4 are you running?
-
I'm using broadinstitute/gatk:latest docker to run DoC.
-
Hi Seunghun Han,
Could you upload your files that contain this issue in a zipped folder to our bug report FTP? There are instructions for how to do that here: https://gatk.broadinstitute.org/hc/en-us/articles/360035889671
Best,
Genevieve
Please sign in to leave a comment.
11 comments