Difference in results between salmon and ASEReadCounter
I ran ASEReadCounter on a vcf file applied to both a normal and a tumor sample. Separately I had ran star/salmon as part of the nf-core RNASeq pipeline. I obtained the following quantification, for example, using star/salmon, for a specific gene from the uncorrected counts:
REF. ALT
ENSG00000213145 CRIP1 129 2564.255
Separately I had a vcf file with variants, and I have one on that gene:
chr14 105488181 . C G
But then when I use ASEReadCOunter I get for that case:
NORMAL:
contig position variantID refAllele altAllele refCount altCount
chr14 105488181 . C G 28 0
totalCount lowMAPQDepth lowBaseQDepth rawDepth otherBases improperPairs
28 0 0 28 0 0
TUMOR:
contig position variantID refAllele altAllele refCount altCount
chr14 105488181 . C G 210 164
totalCount lowMAPQDepth lowBaseQDepth rawDepth otherBases improperPairs
374 0 0 376 2 0
Why is there such a big difference between the star salmon output and this? What is that they are capturing differently?
-
I am not sure how salmon and STAR works but do they have a function to collect read counts for different alleles? ASEReadCounter tool is unique in that sense which collects the count of bases on valid fragments covering each variant site for determining allele specific expression differences. So in theory and practice these are all different tools collecting different sets of information based on reads. They may have different sets of filters and rules for counting in and out for reads aligning transcripts. For example salmon and STAR specifically recognizes transcript specifc reads and discounts those that map either transcript with equal chance. For more information on ASEReadCounter we would strongly recommend looking at the following paper from Castel et. al.
https://www.biorxiv.org/content/biorxiv/early/2015/03/05/016097.full.pdf
I hope this helps.
-
Hi Gokalp, this is actually very helpful. Things did not become clear until I look at things closely using IGV and also read the paper. The results above do make sense, at the gene level you have a lot more expression in the tumor, and that can be due to a number of reason. But at the level of the specific variant of interest, if I look in IGV, I will see exactly what it says above 210 ref and 164 alt, which mean that the tumor contains both alleles.
Please sign in to leave a comment.
2 comments