Why no overlaps between shifted variants and non-shifted variants in gatk4-mitochondria-pipeline? (Mutect2)
a) GATK version used: 4.1.2.0 / picard: 2.22.3
b) Exact GATK commands used:
java -Xmx5G -jar gatk-package-4.1.2.0-local.jar Mutect2 \
-R ${Homo_sapiens_assembly38.chrM.shifted_by_8000_bases.fasta} \
-L chrM \
-I ${input_unmapped_bam} \
-O ${output_vcf} \
--annotation StrandBiasBySample \
--mitochondria-mode \
--max-reads-per-alignment-start 75 \
--max-mnp-distance 0
java -Xmx5G -jar gatk-package-4.1.2.0-local.jar Mutect2 \
-R ${Homo_sapiens_assembly38.chrM.fasta} \
-L chrM \
-I ${input_unmapped_bam} \
-O ${output_vcf} \
--annotation StrandBiasBySample \
--mitochondria-mode \
--max-reads-per-alignment-start 75 \
--max-mnp-distance 0
java -jar %s LiftoverVcf \
I=${vcf_from_shifted_mt} \
O=${shifted_back.vcf} \
R=${Homo_sapiens_assembly38.chrM.fasta} \
CHAIN=${GATK_reference_hg38_chrM/ShiftBack.chain}
REJECT=${output.rejected.vcf}
c) The entire error log if applicable: no errors
Hi,
I ran mitochondria pipeline, but the results are quite weird.
Below is variants from reads aligned to Shifted Mitochondria (Liftovervcf was done!). (the result itself is so weird due to a lot of indels)
This is variants from reads aligned to non-shifted mitochondria.
There is no overlap at all between two variantsets..
Could somebody explain this weird results?
Thanks in advance,
Changhan
-
Hi Changhan,
Did you use the mitochondria best practices WDL? If so, then I'm not surprised that there are no overlapping variants in the two VCFs. We only call the control region of the mitochondria with the shifted reference and then only call the rest of the mitochondria with the regular reference. You can see this in the -L arguments of the WDL (it's chrM:576-16024 for the non-control region, everything else is considered the control region).
In the commands you posted through, it looks like you're only using -L chrM, so I'm not sure why there aren't overlapping sites in your vcfs if both were called across the entire mitochondria.
I'm also not sure why you're seeing so many indels. It looks like you haven't run FilterMutectCalls yet (or the other filtering steps in the pipeline). Perhaps the filtering steps would remove false sites in your callset and the results would look more like what you expect.
Thanks,
Megan
Please sign in to leave a comment.
1 comment