Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

SelectVariants --discordance not working as I expected

Answered
0

3 comments

  • Avatar
    Pamela Bretscher

    Hi Carlos P Arques,

    I brought up this issue with some members of the GATK team this morning and we are thinking that we should file a Github ticket to look into the --discordance argument and potential reasons why the argument is not giving you what you want. However, there are a few suggestions that you can try:

    1. If the wild-type file you are using is actually a gvcf file, SelectVariants would be likely to fail. Could you affirm whether the file is a gvcf or vcf?

    2. The argument may be causing SelectVariants to look at discordant sites rather than discordant variants at the genotype level. Could you try still specifying --discordance for the wild-type file but also specifying --sample-name for each individual sample name including the wild-type?

    3. The last thing that was suggested was that you could potentially achieve what you are wanting by specifying SelectVariants -V conbined.vcf -XL wildtype.vcf to exclude the wild-type variants without using --discordance.

    Please let me know if any of these suggestions reveal a different output and I will keep you updated on anything the GATK team is able to figure out.

    Kind regards,

    Pamela

    0
    Comment actions Permalink
  • Avatar
    Carlos P Arques

    Hi Pamela Bretscher,

    Thank you for looking into it.

    1. The wild-type was a VCF file, but when I examined it closely I found out that there were a lot of positions that had a GT = 0, i.e. same as the reference. So possibly, all the files I was comparing had the same positions marked as variants, even though some of them had a GT = 1 (real variant) and others GT = 0. But, because the position was in the VCF file, they weren't discordant.

    I tried a couple of things that worked for me. I extracted all the individual files from the gvcf using the arguments --exclude-non-variants and --remove-unused-alternates, to ensure that the resulting VCF files had only real variants for that sample (GT = 1). Then, I used the --discordance argument as I intended and as far as I can tell, it worked. I hope this can help someone, and sorry for not posting it sooner.

    I will try your third solution and see if I get similar results.

    Thanks a lot,

    Carlos

    0
    Comment actions Permalink
  • Avatar
    Pamela Bretscher

    Hi Carlos P Arques,

    Thank you for providing your solutions, this will be very helpful for other users as well as the GATK team. 

    Kind regards,

    Pamela

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk