Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Splitting a final filtered VCF file based on sample list and MAF Impact

1

2 comments

  • Avatar
    Gökalp Çelik

    Hi Conor Sexton

    AF field in each VCF record is calculated based on number of alleles, number of alternate alleles and ploidy. Some tools tend to recalculate AF field properly when a multisample VCF is subdivided by samples but some of them do not. You need to pay attention to that part. 

    If you wish to protect the original AF values based on all the samples you can use

    gatk VariantAnnotator 

    or

    bcftools annotate 

    tools to reannotate your subsampled VCF file using the original whole VCF file and you may be able to add the original AF values as an INFO field to your new VCF file if this is what you really wish to do. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Conor Sexton

    Hi Gökalp Çelik ,

    Thank you very much for your quick feedback. It took a while but I tried both methods and they worked as intended. I looked into your suggestion for annotations and this worked as I hoped. 

    Thanks very much again for your advice!
    Regards.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk