I'm using GATK 126.96.36.199.
1. alignment with BWA mem
2. GATK BQSR
3. HaplotypeCaller GVCF mode
gatk --java-options "-Xmx4G" HaplotypeCaller -R hg19.fa -I file_recal_reads.bam --emit-ref-confidence GVCF -L /interval.bed --dbsnp dbsnp_138.hg19.vcf.gz -O file.g.vcf --bamout file.bam
We have data generated from amplicon sequencing (MIP, Molecular Inversion Probes) and thus we cannot perform duplicate marking or filtering steps like end-distance bias or strand-bias on called variants, because these sites are generally covered by reads in only one direction.
What I observed is that the depth (AD and DP) is lower in the gvcf or bamout respect to the original bam.
example for a site
original bam: total count 222, Allele A 222
AC=2;AF=1.00;AN=2;DP=50;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=35.61;SOR=7.864 GT:AD:DP:GQ:PL 1/1:0,50:50:99:1823,150,0
This for many loci.
I saw some older posts (2016) describing some validated variants missed by HC using MIP data.
I was wondering if it is some options to add for an analysis with amplicon sequencing data. Any guidelines?
Please sign in to leave a comment.