Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

DepthofCoverage GATK4.1.7.0 -gene-list

Answered
0

15 comments

  • Avatar
    marta r

    Geraldine Van der Auwera how can I ask help to the team?

    0
    Comment actions Permalink
  • Avatar
    Tiffany Miller

    Hi marta r , you are doing the right thing by posting here. We ask for your patience and understanding as this is a free support resource staffed by one person right now. We will get to your question in the order it was received. I hope you understand! 

    0
    Comment actions Permalink
  • Avatar
    Tiffany Miller

    Hi marta r  - I took a quick look. I am no expert on this tool but the "no suitable codecs" message makes me believe something is wrong with your refseq file. Here is some documentation on this: https://gatk.broadinstitute.org/hc/en-us/articles/360035532032-RefSeq-gene-list-format

    I will try to take a closer look tomorrow. 

    0
    Comment actions Permalink
  • Avatar
    marta r

    Tiffany Miller thank you very much for the help. I think that the problem could not be the RefSeq format: I followed the guideline that you mentioned and I also tried different RefSeq (not only form UCSC) with several features. This is why I thought to a bug in the new version (same RefSeq files work well with the GATK 3) 

    0
    Comment actions Permalink
  • Avatar
    Tiffany Miller

    Hi marta r I am not able to replicate your error using gatk 4.1.7.0

    Here are the commands I used and the way I set up the RefSeq file. We need to update the RefSeq doc b/c it isn't clear as to what track or table to use so I will put that on our to-do list. I think what is causing your issue is using .txt instead of .refseq. I renamed my refseq file to .txt and got a similar error as you. 

    gatk \

    DepthOfCoverage \

    -R ref/ref.fasta \

    -O test_DOC -I bams/mother.bam \

    -gene-list intervals/test_chr20_v2.refseq \

    -L intervals/motherHighconf.bed

    0
    Comment actions Permalink
  • Avatar
    marta r

    Tiffany Miller thank you very much for the help. Now it works, but I have still another question about the output

    This was my command:

    gatk DepthOfCoverage -R /path/ucsc.hg19.fasta -O coverage_name -I /input1.bam -I /input2.bam -I /input3.bam -gene-list /path/genome.refseq --summary-coverage-threshold 5 --summary-coverage-threshold 15 --summary-coverage-threshold 30 -L /path/mybed.bed

    an these the outputs:

    coverage_name

    coverage_name.sample_cumulative_coverage_counts

    coverage_name.sample_cumulative_coverage_proportions

    coverage_name.sample_gene_summary

    coverage_name.sample_interval_statistics

    coverage_name.sample_interval_summary

    coverage_name.sample_statistics

    coverage_name.sample_summary

     

    As you can see, I did't obtain the "_gene_statistics" output. Do you know why? Is there any option to add?

     

    Thank you

     

    0
    Comment actions Permalink
  • Avatar
    marta r

    Ah, another point: the sample_interval_summary file was empty (only the header is present)

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi marta r, it looks like you are using GATK4.1.7.0, and we are actually on 4.1.8.0. The problem with the interval summary file was patched in that update. Would you be able to update your GATK and re-run this tool to see if there is still a problem?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi marta r, we are looking into the issue with the _gene_statistics output. Could you please try running the command with only one bam file and see if the output is there? The most common usage is with one input file so that may be where the issue is coming from.

    0
    Comment actions Permalink
  • Avatar
    marta r

    Hi, we tried with only one BAM file but the output files didn't changed. I will try with the new version GATK 4.1.8.0

    Thank you

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi marta r, thank you for testing this. We have created a github ticket to fix this, and you can follow along with the issue here: https://github.com/broadinstitute/gatk/issues/6714

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi marta r, did you get the correct output with the _gene_statistics file when updating to 4.1.8.0? 

    0
    Comment actions Permalink
  • Avatar
    marta r

    sorry for the big delay! Yes it works fine, thank you!

    0
    Comment actions Permalink
  • Avatar
    Sinem Selvi

    Hello,
    Is it possible to get transcript numbers of the genes in the sample_gene_summary file?

    Thanks

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Sinem Selvi

    The feature you are describing is currently possible but you would need to do some workarounds to make it happen. You could provide a file with a transcript list from the -gene-list argument instead of the normal input gene list. If you can generate that input file in the correct RefSeq format, then the tool should work for you.

    If you want to submit an official feature request to make this a larger part of the tool, please make a new post in the "General Discussion" topic. If many other users also want this feature and interact with the post then we can prioritize adding it.

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk