Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

The same number of SNPS and INDELS identified for all samples after joint genotyping

Answered
0

12 comments

  • But, do you see different variant type between the samples.

    0
    Comment actions Permalink
  • Avatar
    Vincent Appiah

    Priyadarshini Thirunavukkarasu ,  the same number snps and indels were identified.

    I hope this answers your question.

    0
    Comment actions Permalink
  • Could you give an example?

    0
    Comment actions Permalink
  • Avatar
    Vincent Appiah

    Here is an example.

    I used bcftools to count the number of variants identified

     

    samples=(100N 100T 10T)
    for sample in ${samples[@]};do echo $sample $(bcftools view -s $sample $vcf|grep -v -c '^#');done

    Result

    100N 476154
    100T 476154
    10T 476154
     

     

    0
    Comment actions Permalink
  • Thank you. Does these samples, have same type of variant at a given position. For example, if it is a missense mutation across all the samples at a given position or the type of mutation differs?.

    0
    Comment actions Permalink
  • Avatar
    Vincent Appiah

    Priyadarshini Thirunavukkarasu

    I am showing results for just three samples but its the same for the others

    I tried with one position and are the output

    COUNTS OF SNPS AND INDELS

    sample   indels    snps        Total
    100N       47757    429534    477291
    100T       47757    429534    477291
    10T         47757    429534    477291

     

    VARIANTS AT A POSITION

    sample    ID                  POS      REF    ALT
    100N      rs41304577    93425        A      G
    100T       rs41304577    93425        A      G
    10T         rs41304577    93425        A      G

             
             
             
             

     

     

    0
    Comment actions Permalink
  • Did you check the genotype. If it is homozygous for the variant allele at a given position? I have run a different pipeline for my samples and it shows the similar variant allele at a given position. But, the genotype was not same for the variant allele across samples

    0
    Comment actions Permalink
  • Avatar
    Vincent Appiah

    OK. Thanks

    Buts its a bit confusing.  When I run GenotypeGVCFs on the individual vcfs generated using the HaplotypeCaller, the number of variants were different for each sample.

    Sample      No. of Variants

    100 N          108353
    100 T          104014
    10T             106707

    I am going to try looking into the genotypes for the multi-sample vcf file

    0
    Comment actions Permalink
  • When you do joint genotyping, it is at a given site the variant calling is done. So,the genotypes will different between these samples for the same snp at a given position

    0
    Comment actions Permalink
  • Avatar
    Vincent Appiah

    Thanks for the clarification Priyadarshini Thirunavukkarasu.

    What I like to know is when doing joint variant calling, does gatk look for only variants that are common to all the samples?

    0
    Comment actions Permalink
  • Avatar
    Vincent Appiah

    Genevieve Brandt (she/her). Can you help clarify this question?

     

    Thanks

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Vincent Appiah,

    Yes, Priyadarshini Thirunavukkarasu is correct here. When you do joint genotyping, you first get a GVCF, which has information about all the sites in the genome. So when you call variants with GenotypeGVCFs, there will be a variant line whenever any of the samples has a variant at that site. You can determine which samples have the variant and which do not in the genotype fields.

    GATK does not only look for variants that are common to all the samples, it can also call variants that are only in one sample.

    You can read more about joint calling here: https://gatk.broadinstitute.org/hc/en-us/articles/360035890431-The-logic-of-joint-calling-for-germline-short-variants

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk