Hi, GATK team!
a) GATK version used: 184.108.40.206
b) Exact command used: Mutect2 --mitochondria-mode
c) Entire program log: running sucess but confused by the results with differen parameters
I am analysing amplicon sequencing data of human mtDNA. I followed the advices in 'Mitochondrial short variant discovery' workflow (link Mitochondrial short variant discovery (SNVs + Indels) – GATK (broadinstitute.org)). However, my data is target-sequenced (or say amplicon sequencing), I am wondering if you can offer further advice on such data.
First of all, I have pre-trimmed the primers in fastq files using other tools. I do not mark duplicates in amplicon sequencing data. I used bwa mem to align the reads. I did not do BQSR before calling variant using mutect2. The resulted vcfs haven't be further filtered. '--max-reads-per-alignment-start' hasn't been set to 0 since the depth is too high, my computer runs too slow.
Therefore, I used the recommended commands :
gatk Mutect2 --reference /home/agcu/genomes/mt-wgs-revise.fa --intervals MT --mitochondria-mode true --input mt1.rg.sort.bam --output test1-3.vcf.gz
Besides, I screened the discussion posts and also tried to adopt the advices in this post (link Problems in subsampling reads for Mutect2 with --max-reads-per-alignment-start – GATK (broadinstitute.org)):
- We do not recommend running MarkDuplicates with amplicon data
- Use a GATK version newer than 220.127.116.11 and turn off using soft clipped bases with --dont-use-soft-clipped-bases.
- Check how many reads are filtered in the GATK program log
- Turn off the down sampler
I tried to turn off using soft clipped bases, but I get more and different variants in the resulted vcf than not turnning of using soft clipped bases. Is this normal? Why turn off using soft clipped bases result in more variants, and why some variants are not both included in two callings? I am confused. The command is: gatk Mutect2 --reference /home/agcu/genomes/mt-wgs-revise.fa --intervals MT --mitochondria-mode true --dont-use-soft-clipped-bases --input mt1.rg.sort.bam --output test1-3.vcf.gz
At last, would you please offer more advice on how to set mutect2 parameters on somatic but amplicon sequence data?
Please sign in to leave a comment.