SNPs and INDELs from RNA-seq mapping to introns
GATK Version:
I am currently following the GATK best practices pipeline for RNA-seq data with raw reads from mouse brain samples. After variant calling and annotation (as well as filtration: DP >10 and QD>2) there are variants mapping to intron regions regardless of isoform. Is this indicative of errors or poor quality in the pre-processing steps? Will it affect further downstream analysis of the variants? If so, how do I deal with these?
Note: there were no error messages from the tools in the pipeline
-
Variant Callers will not limit the positions for variant calling unless a bed file or interval file is added. HaplotypeCaller in this regard will call variants from anywhere on the reference sequence. If you wish to limit your variants to coding and non coding exons only you may need to use SelectVariants function with an interval file to limit your calls or use that bed file directly at the HaplotypeCaller stage.
RNA sequencing experiments depending on the library preparation may include unspliced or misspliced intronic regions and there is no way to remove those reads that are inevitably included in your sample.
In short what you see is already expected.
I hope this helps.
Regards.
Please sign in to leave a comment.
1 comment