Asking for advice on Mutect2 calling in somatic but amplicon data
Hi, GATK team!
a) GATK version used: 4.3.0.0
b) Exact command used: Mutect2 --mitochondria-mode
c) Entire program log: running sucess but confused by the results with differen parameters
I am analysing amplicon sequencing data of human mtDNA. I followed the advices in 'Mitochondrial short variant discovery' workflow (link Mitochondrial short variant discovery (SNVs + Indels) – GATK (broadinstitute.org)). However, my data is target-sequenced (or say amplicon sequencing), I am wondering if you can offer further advice on such data.
First of all, I have pre-trimmed the primers in fastq files using other tools. I do not mark duplicates in amplicon sequencing data. I used bwa mem to align the reads. I did not do BQSR before calling variant using mutect2. The resulted vcfs haven't be further filtered. '--max-reads-per-alignment-start' hasn't been set to 0 since the depth is too high, my computer runs too slow.
Therefore, I used the recommended commands :
gatk Mutect2 --reference /home/agcu/genomes/mt-wgs-revise.fa --intervals MT --mitochondria-mode true --input mt1.rg.sort.bam --output test1-3.vcf.gz
Besides, I screened the discussion posts and also tried to adopt the advices in this post (link Problems in subsampling reads for Mutect2 with --max-reads-per-alignment-start – GATK (broadinstitute.org)):
- We do not recommend running MarkDuplicates with amplicon data
- Use a GATK version newer than 4.2.0.0 and turn off using soft clipped bases with --dont-use-soft-clipped-bases.
- Check how many reads are filtered in the GATK program log
- Turn off the down sampler
I tried to turn off using soft clipped bases, but I get more and different variants in the resulted vcf than not turnning of using soft clipped bases. Is this normal? Why turn off using soft clipped bases result in more variants, and why some variants are not both included in two callings? I am confused. The command is: gatk Mutect2 --reference /home/agcu/genomes/mt-wgs-revise.fa --intervals MT --mitochondria-mode true --dont-use-soft-clipped-bases --input mt1.rg.sort.bam --output test1-3.vcf.gz
At last, would you please offer more advice on how to set mutect2 parameters on somatic but amplicon sequence data?
-
yangjw Using the --linked-de-bruijn-graph argument might improve accuracy and if you're lucky it will reduce the CPU cost enough to turn off or at least reduce downsampling.
It's also worth trying to use the --downsampling-stride argument. There's no harm in setting it as large as 50 and this might smooth out the downsampling.
I wouldn't worry about differences in unfiltered Mutect2 output. You should always run FilterMutectCalls (or better yet, run the entire pipeline using our best practices WDL script) on the raw output of Mutect2.
Finally, in most cases it is best to keep soft-clipped bases.
-
Hi, David, thank you for your reply. I set the '--downsampling-stride 50' and get the ideal result. Your advice helps me a lot! Thank you again!
Please sign in to leave a comment.
2 comments