Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Preform variant calling on 50 bp SE rna-seq data

0

4 comments

  • Avatar
    Pamela Bretscher

    Hi Alon Ziv,

    Yes, variant calling for single-end RNA-seq data is possible with GATK. There is currently a best practices and pipeline for calling germline RNA-seq variants which you may find helpful. A few considerations when doing RNA-seq variant calling may be the depth of coverage, the inability to detect heterozygous variants, and that RNA-seq data is usually more prone to false-positive calls. However, it can certainly be done and here is a link to a discussion post about RNA-seq for somatic variants with some additional considerations and advice from one of the GATK developers: https://gatk.broadinstitute.org/hc/en-us/community/posts/360056152451-Mutect2-for-RNA-seq-

    I hope you find this information helpful and please let me know if you have additional questions or concerns.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Alon Ziv

    Hi Pamela,

    thanks for the quick answer,

    I did some variant calling before with different data, so I'm familiar with the pipeline,

    but in this case, because it is a 50bp single-end, I just want to make sure which parameters I should take into consideration when filtering - obviously QualByDepth and also RMSMappingQuality ( I think). but what about other parameters that take heterozygosity/alternate alleles into considerations such as ReadPosRankSumTest, MappingQualityRankSumTest, StrandOddsRatio, and FisherStrand.

    thanks again

    Alon

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Alon Ziv,

    I'm not super familiar with ideal parameters for RNA-seq variant discovery, however, the parameters you choose to use will depend on the depth and coverage of your sequencing and how many samples you have. For high coverage data, the StrandOddsRatio parameter would be recommended, whereas FisherStrand is ideal for low coverage or low sample number. Using one of these is a good idea if you are dealing with stranded data. It would probably be useful to run MappingQualityRankSumTest and ReadPosRankSumTest, but be aware of potential unexpected/unreliable results due to the higher error rates in RNA-seq data. In general, however, I would recommend erring on the side of filtering too little rather than too much when it comes to RNA-seq variants. I hope this provided some useful information.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Alon Ziv

    Hi Pamela,

    thanks again,

    this conversation and information were very helpful,

    Alon

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk