Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Asking for Guidance on Variant Calling Pipeline - Biological replicates


1 comment

  • Avatar
    Gökalp Çelik

    Hi H.T

    The message about GVCF mode being incompatible with multiple samples is correct. When HaplotypeCaller complains about multiple samples it means that each bam input has a different sample name or a pooled bam has multiple sample names present in it. If you really wish to distinguish you replicates using different sample names and generate a VCF with these different sample names you can run HaplotypeCaller just without the -ERC GVCF parameter and HaplotypeCaller will still generate a multisample VCF file. 

    You can check the header of your bam file using 

    samtools view -H

    If you observe multiple SM tags in the header then it is the reason for your troubles. 

    If your intention is to have a single sample name but a combined bam file just to increase depth then you need to use

    gatk AddOrReplaceReadGroups 

    tool to fix multiple sample names issue from your bam file. 

    I hope this helps. 

    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk