Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Empty vcf after GenotypeVCFs when combining already genotyped samples

0

4 comments

  • Avatar
    Tiffany Miller

    Hi @leze,

    Do both of these input vcfs pass ValidateVariants ? Can you explain what you mean by the first cohort (you mentioned you don't have access to the bams)? Are you running Haplotypecaller per sample?

    0
    Comment actions Permalink
  • Avatar
    leze

    Hi Tiffany Miller

    thanks for getting back to me.

    Yes, I did run ValidateVariants and the vcfs passed.

    Explanations to the datasets:

    1. Cohort: VCF with SNPs called with HC per sample and CombineGVCFs, GATK 3.5 (older samples, no bams available.

    2. Cohort: VCF/gVCF/BAM/fastq available (new samples). I used GATK 4.1.2.0.

    These two sets I tried combining, ideally using joint genotype calling from genotype likelihoods (annotations are available in both VCFs.

    Meanwhile, I was able to combine the two using the following approach:

    1. grep -vE "NON_REF" mixed.vcf > mixed.mod.vcf # removes all loci with NON_REF as ALT
      grep -vE "NON_REF" tetra.vcf > tetra.mod.vcf
    2. GATK 3.7 CombineVariants
    3. SelectVariants -select 'set == "Intersection"'

    This is ok for now but certainly not ideal. Maybe you have a suggestion to improve this

     

     

    0
    Comment actions Permalink
  • Avatar
    Tiffany Miller

    Hi leze I will check with the team and get back to you. 

    0
    Comment actions Permalink
  • Avatar
    Tiffany Miller

    Hi leze unfortunately, we don't have a better approach given your circumstance with not being able to regenerate the individual gvcfs for cohort 1. 

    Generally, we don't recommend joint calling across different versions of HC because there could be artifacts. The mapping quality annotations between these versions are incompatible. I wish I had a better suggestion.

    If anyone else in the community has experienced this and has something to add, please chime in!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk