Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

CollectRnaSeqMetrics GL000220 test not counting ribosomal reads

Answered
0

5 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi rcorbett,

    Could you clarify if you think these results from CollectRnaSeqMetrics is incorrect for the input bams or if you are asking why your data is aligning to UTRs instead of ribosomal intervals?

    Thank you,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    rcorbett

    Many thanks for your help Genevieve-Brandt-she-her.

    It is more that I am confused by the output.  I am not expecting a high UTR amount as my data are almost exclusively ribosomal and 45S, to my knowledge contains only an ETS, not a UTR.

    Also, my flatfile doesn't include a UTR annotation (unless I am mistaken).

    So I guess I might be suggesting the results are incorrect, but I'm equally open to the issue being user error :)

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    One of the issues may be with your ref_flat file. The coding region start and end are the same value at the end of the transcript. Here is a link to more information about the ref_flat file format, the coding region start and end are the 3rd and 4th number in each annotation.

    The way your ref_flat file is now is saying that the coding regions are 0 bases long. Since UTR regions are everything outside of the coding regions, this is probably why your UTR counts are so high. 

    The ribosomal count being 0 is more puzzling. Could you confirm that there are reads that overlap with the ribosomal regions? The reads would only be counted if their mates are not unmapped and their chromosome is the same as their mate chromosome. Double check for those possible issues and let me know what you find!

    0
    Comment actions Permalink
  • Avatar
    rcorbett

    OK.  I think we're making some progress here.   I am starting with a refseq .gtf file where the annotation for my rRNA looks like this:

    chrUn_GL000220v1 BestRefSeq transcript 105424 118780 . + . gene_id "RNA45SN5"; transcript_id "NR_046235.3"; db_xref "GeneID:100861532"; gbkey "rRNA"; gene "RNA45SN5"; product "RNA, 45S pre-ribosomal N5"; transcript_biotype "rRNA"; 
    chrUn_GL000220v1 BestRefSeq exon 105424 118780 . + . gene_id "RNA45SN5"; transcript_id "NR_046235.3"; db_xref "GeneID:100861532"; gene "RNA45SN5"; product "RNA, 45S pre-ribosomal N5"; transcript_biotype "rRNA"; exon_number "1";

    Then I use gtfToGenePred to convert to a flat file with this command:

    ./gtfToGenePred -ignoreGroupsWithoutExons all.gtf all.flat

    and then my flat file contains

    NR_046235.3 NR_046235.3 chrUn_GL000220v1 + 105423 118780 118780 118780 1 105423, 118780,

    As you pointed out above, this entry contains no coding region.  Maybe there is a better way to go from refseq gtf to refflat?

    My sample contains reads almost exclusively at the terminal end of the the 45S annotation.  I am attaching and IGV screenshot of the region.

     

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    rcorbett we don't have any specific recommendations for how to create the ref_flat file. Since we are not the creators of gtfToGenePred, we are not able to look more closely about what might be going wrong. I would recommend reaching out to the developers of that tool for more information.

    I did find a potential alternate tool but I have not tested it myself: https://biopet.github.io/gtftorefflat/develop/

    I'm sorry we don't have more information. Please let me know if you have further questions.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk