Delete <NON_REF> from VCF
Hello!
I have some problem with my VCF-file and <NON_REF> tags.
I used advice from this topic https://gatkforums.broadinstitute.org/gatk/discussion/24223/removing-non-ref-tags-from-vcf. When I used the command with options --exclude-non-variants I have normal VCF without strings with <NON_REF> tags. For example:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT s1
1 131 . G T,<NON_REF> 416.77 . AS_RAW_BaseQRankSum=|2.4,1|NaN;AS_RAW_MQ=1030620.00|120218.00|0.00;AS_RAW_MQRankSum=|1.2,1|NaN;AS_RAW_ReadPosRankSum=|-0.2,1|NaN;AS_SB_TABLE=146,162|14,20|0,0;BaseQRankSum=2.406;DP=459;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=1.271;RAW_MQandDP=1538281,459;ReadPosRankSum=-0.131 GT:AD:DP:GQ:PL:SB 0/1:308,34,0:342:99:445,0,12925,1387,13026,14412:146,162,14,20
1 134 . TAA GAA,T,*,<NON_REF> 6746.73 . AS_RAW_BaseQRankSum=|-3.9,1|2.2,1|NaN|NaN;AS_RAW_MQ=403443.00|587690.00|120218.00|0.00|0.00;AS_RAW_MQRankSum=|-4.1,1|-0.9,1|NaN|NaN;AS_RAW_ReadPosRankSum=|-5.1,1|-2.3,1|NaN|NaN;AS_SB_TABLE=54,59|85,97|14,20|0,0|0,0;BaseQRankSum=-2.820;DP=464;ExcessHet=3.0103;MLEAC=1,0,0,0;MLEAF=0.500,0.00,0.00,0.00;MQRankSum=-3.711;RAW_MQandDP=1560396,464;ReadPosRankSum=-5.027 GT:AD:DP:GQ:PL:SB 0/1:113,182,34,0,0:329:99:6784,0,4267,5898,3504,12458,7146,4845,10798,12085,7145,4845,10796,12083,12082:54,59,99,117
But when I used options --select-type-to-include SNP, --select-type-to-exclude NO_VARIATION and --remove-unused-alternates, I havan't information in file. I have in output:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT s1
(nothing more)
So, what do i should make with my VCF so I can have VCF without all NON_REF tags?
P.S. English is not my native language, so please be kind to my mistakes.
-
Hi ValeriyaVS !
A symbolic <NON_REF> allele represents non-called but possible non-reference alleles in GVCF files. If you are running through the best practices, later, the genotyping step will retain only sites that are confidently variant against the reference and these <NON_REF> blocks go away. Follow these steps to do that or run HaplotypeCaller in single sample mode without -ERC GVCF argument. Let me know if this helps!
-
Thank you so much! It's work.
Please sign in to leave a comment.
2 comments