Hi GATK community! I have a situation that I haven't been able to find a solution to via the available forum posts.
I am hoping to use VQSR to improve my current SNP set.
What we have:
- a "truth set" previously used with VQSR for a different project. This set was constructed from a single species "X", with reference genome from species "X".
- a VCF file with ~350 samples all called to the same reference genome as above. 250 of these samples are of species "X", the remaining 100 samples are of different species, but closely related in the same genus. The reason why these other 100 samples are included is because they will be used to identify admixed samples (within the 250 sample group) in separate analyses.
Is it reasonable to apply the "truth set" to my VCF file if there are several species represented in this VCF file?
Please sign in to leave a comment.