a) I'm using GATK4-18.104.22.168-1
b) gatk SelectVariants -V $combined.vcf.gz -R $genome --discordance $wildtype.vcf -O $discordant.combined.vcf.gz
c) Why do I see (......)? But I don't retrieve any variants (zero).
I'll explain from the beginning. So I have around 8 bam datasets, 7 alleged mutants and 1 from a wild-type parental strain of the other 7. I ran HaplotypeCaller on each bam file using the arguments -ERC GVCF -ploidy 1 because I don't expect differences in ploidy and they're haploid. Then, I combined all the vcf files with CombineGVCFs; genotyped them with GenotypeGVCFs; and filtered the SNPs using these criteria:
-filter "QD < 20.0" --filter-name "QD20" \
-filter "QUAL < 30.0" --filter-name "QUAL30" \
-filter "SOR > 3.0" --filter-name "SOR3" \
-filter "FS > 60.0" --filter-name "FS60" \
-filter "MQ < 40.0" --filter-name "MQ40" \
Now, what I want to do is to remove all the variants that are present in the wild-type vcf track; or take the variants that are absent in the wild-type, same difference. And for that, I thought about using
gatk SelectVariants -V $combined.vcf.gz -R $genome --discordance $wildtype.vcf -O $discordant.combined.vcf.gz
Where $combined.vcf.gz is the combined file I got after combining, genotyping, and filtering, $genome is my reference genome, and $wildtype.vcf is the initial vcf file I produced with HaplotypeCaller for the wild-type bam dataset.
The thing is I get 0 variants back, and I can see there are discordant variants (variants that are present in one or more of the mutants, but not in the wildtype) using IGV and looking at the combined vcf.
I also tried running something similar using the individual files generated by HaplotypeCaller in pairwise comparisons with the wild-type track, and I also get 0 variants back, so I must be definitely doing something wrong.
By the way, if I use --concordance instead, I get ALL the variants, even though some are clearly not concordant.
Thank you for your help,
Please sign in to leave a comment.