Getting the variant counts
I did differential expression analysis and obtained something like this:
Chromosome Start_Position REF ALT NormalCounts TumorCounts
chr17 69043703 G T 64.76958 169.4108
(the last two are the scaled counts from salmon)
I also wanted the allele-specific read counts so used the ASEReadCounter tool but, for this variant, I obtained something like this:
contig position variantID refAllele altAllele refCount altCount
chr17 69043703 . G T 2 0
totalCount lowMAPQDepth lowBaseQDepth rawDepth otherBases improperPairs
2 0 0 2 0 0
The actual annotated variant in the VCF is this:
T|missense_variant|MODERATE|ABCA9|ENSG00000154258|Transcript|ENST00000340001|protein_coding|6/39||||661/6377|586/4875|196/1624|H/N|Cat/Aat|||-1||SNV|HGNC|HGNC:39|YES|NM_080283.4||1|P1|CCDS11681.1|ENSP00000342216|Q8IUA7.148||UPI00000747B1|Q8IUA7-1||deleterious(0.01)|probably_damaging(0.91)||||||||||||||||||||||||||||||||||||||||
I am interested in the allele-specific counts, but the ASEReadCounter tools seems to give me something that contains very low numbers. Why is this?
-
What is the source of bam files you use with ASEReadCounter?
-
Thank you Gökalp Çelik!
I am using a bam that came as a result of the nf-core RNASeq pipeline, I started with fastq files. RNASeq uses salmon and I end up with a bam file. I also have a vcf with Mutect2 and Strelka2 variants that I ran separately on a corresponding DNA sample. I then run:
gatk --java-options "-Xmx4915M -XX:-UsePerfData" \
ASEReadCounter \
--output MS002_Tumor_ase.csv
--input MS002_Tumor.markdup.sorted.bam
--variant MS002_TUMOR_vs_NORMAL.mutect2.filtered_VEP.ann.missenseOnly.vcf \
--tmp-dir . \
-R ./genomes/GATK_GRCh38/Sequence/WholeGenomeFasta/Homo_sapiens_assembly38_renamedHLAContigs.fastaVersions:
The Genome Analysis Toolkit (GATK) v4.6.1.0
HTSJDK Version: 4.1.3
Picard Version: 3.3.0 -
The only source bam file you can get from that workflow is through STAR aligner which scores unique mapping reads with score 255 which is totally incompatible with Mutect2 and rest of our GATK tools. Your only variants might have been from strelka which can take MAPQ 0 reads into consideration during somatic calls. I would suggest you to apply SplitNCigarReads tool to your bam file and call variants and run ASEReadCounter on that processed bam file.
Regards.
-
Thank you Gökalp Çelik ! Very helpful. Will do as you suggest.
Please sign in to leave a comment.
4 comments