I constructed a dataset from complete genomes using Haplotype caller in GVCF mode followed by JointGenoyping. Later I wanted to expand my dataset, but as my timeframe was short I decided to only call the SNPs found in my initial dataset in the bam files of the new samples thanks to mpileup (samtools).
The result is that there is a lot of missing data in the new samples (around 40%)
I guess this is due to the different method of calling the variants but I would need help to understand the details of what causes this problem.
Please sign in to leave a comment.