Asking for Guidance on Variant Calling Pipeline - Biological replicates
-
Hi H.T
The message about GVCF mode being incompatible with multiple samples is correct. When HaplotypeCaller complains about multiple samples it means that each bam input has a different sample name or a pooled bam has multiple sample names present in it. If you really wish to distinguish you replicates using different sample names and generate a VCF with these different sample names you can run HaplotypeCaller just without the -ERC GVCF parameter and HaplotypeCaller will still generate a multisample VCF file.
You can check the header of your bam file using
samtools view -H
If you observe multiple SM tags in the header then it is the reason for your troubles.
If your intention is to have a single sample name but a combined bam file just to increase depth then you need to use
gatk AddOrReplaceReadGroups
tool to fix multiple sample names issue from your bam file.
I hope this helps.
Please sign in to leave a comment.
1 comment