Using GATK ASE read counter with strelka
I am using GATK ASEReadcounter on some RNASeq data. Separately I have vcf files that I generated using mutect and strelka on DNA data. ASEReadCounter works fine with the mutect vcf, but for the strelka vcf I get a lot of things such as:
15:19:25.617 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
15:19:25.750 WARN ASEReadCounter - Ignoring site: variant is not het at postion: chr1:17365
15:19:25.750 WARN ASEReadCounter - Ignoring site: variant is not het at postion: chr1:17375
15:19:25.750 WARN ASEReadCounter - Ignoring site: variant is not het at postion: chr1:17406
15:19:25.763 WARN ASEReadCounter - Ignoring site: variant is not het at postion: chr1:19322
And I get an empty result. The vcf contains some variants in common, for example
Mutect:
chr1 39302949 . G A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=20,33|12,18;CSQ=A|missense_variant|MODERATE|MACF1|ENSG00000127603|Transcript|ENST00000564288|protein_coding|23/101||||2988/24340|2660/22668|887/7555|C/Y|tGt/tAt|rs1379344348||1||SNV|HGNC|HGNC:13664|YES|MANE_Select|NM_001394062.1||5||CCDS90919.1|ENSP00000455274||H3BPE1.88|UPI0002467516||1||benign(0.022)|Pfam:PF17902&PROSITE_profiles:PS50002&Gene3D:2.30.30.40&Gene3D:1.20.58.1060||||||||6.842e-06|2.987e-05|0|0|2.52e-05|0|0|7.196e-06|0|0|6.569e-06|0|0|0|0|0|0|0|1.47e-05|0|0|2.987e-05|gnomADe_AFR||||||||||,A|downstream_gene_variant|MODIFIER|HSPE1P8|ENSG00000217897|Transcript|ENST00000406997|processed_pseudogene||||||||||rs1379344348|1345|-1||SNV|HGNC|HGNC:49326|YES|||||||||||||||||||||||6.842e-06|2.987e-05|0|0|2.52e-05|0|0|7.196e-06|0|0|6.569e-06|0|0|0|0|0|0|0|1.47e-05|0|0|2.987e-05|gnomADe_AFR||||||||||;DP=83;ECNT=1;GERMQ=82;MBQ=20,20;MFRL=228,208;MMQ=60,60;MPOS=37;NALOD=1.4\
6;NLOD=8.43;POPAF=6.00;ROQ=93;TLOD=75.45 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/0:39,0:0.033:39:16,0:11,0:28,0:15,24,0,0 0/1:14,30:0.633:44:7,10:3,11:12,21:5,9,12,18
Strelka:
chr1 39302949 . G A . PASS CSQ=A|downstream_gene_variant|MODIFIER|HSPE1P8|ENSG00000217897|Transcript|ENST00000406997|processed_pseudogene||||||||||rs1379344348|1345|-1||SNV|HGNC|HGNC:49326|YES|||||||||||||||||||||||6.842e-06|2.987e-05|0|0|2.52e-05|0|0|7.196e-06|0|0|6.569e-06|0|0|0|0|0|0|0|1.47e-05|0|0|2.987e-05|gnomADe_AFR||||||||||,A|missense_variant|MODERATE|MACF1|ENSG00000127603|Transcript|ENST00000564288|protein_coding|23/101||||2988/24340|2660/22668|887/7555|C/Y|tGt/tAt|rs1379344348||1||SNV|HGNC|HGNC:13664|YES|MANE_Select|NM_001394062.1||5||CCDS90919.1|ENSP00000455274||H3BPE1.88|UPI0002467516||1||benign(0.022)|Pfam:PF17902&PROSITE_profiles:PS50002&Gene3D:2.30.30.40&Gene3D:1.20.58.1060||||||||6.842e-06|2.987e-05|0|0|2.52e-05|0|0|7.196e-06|0|0|6.569e-06|0|0|0|0|0|0|0|1.47e-05|0|0|2.987e-05|gnomADe_AFR||||||||||;DP=87;MQ=60.00;MQ0=0;NT=ref;QSS=106;QSS_NT=106;ReadPosRankSum=-1.34;SGT=GG->AG;SNVSB=0.00;SOMATIC;SomaticEVS=19.73;TQSS=1;TQSS_NT=1 DP:FDP:SDP:SUBDP:AU:CU:GU:TU 42:0:0:0:0,0:1,1:41,41:0,0 45:0:0:0:31,31:0,0:14,14:0,0
The other inputs are the same thing i.e. same alignment and reference genome.
Is there anything that I need to do to the strelka vcf file so that GATK ASE read counter can work?
-
Strelka VCF does not contain a GT field therefore each variant site is ignored. ASEReadCounter tool requires heterozygous marked sites for proper function. Mutect2 adds GT field as a compatibility measure and does not warrant any proper genotype information to be present at FORMAT/GT field. You may want to modify strelka vcf by adding FORMAT/GT field to each variant and add 0/1 to each biallelic site.
Regards.
-
Thank you, this is exactly what I needed!!
Please sign in to leave a comment.
2 comments