Minimum median mapping quality output query
REQUIRED for all errors and issues:
a) GATK version used: 4.2.3.0
b) Exact command used:
```FilterMutectCalls --output mutect.filtered.vcf.gz --variant mutect.vcf.gz --reference /storage1/fs1/bga/Active/gmsroot/gc2560/core/model_data/2887491634/build21f22873ebe0486c8e6f69c15435aa96/all_sequences.fa --threshold-strategy OPTIMAL_F_SCORE --f-score-beta 1.0 --false-discovery-rate 0.05 --initial-threshold 0.1 --mitochondria-mode false --microbial-mode false --max-events-in-region 2 --max-alt-allele-count 1 --unique-alt-read-count 0 --min-median-mapping-quality -1 --min-median-base-quality 20 --max-median-fragment-length-difference 10000 --min-median-read-position 1 --max-n-ratio Infinity --min-reads-per-strand 0 --min-allele-fraction 0.0 --contamination-estimate 0.0 --log-snv-prior -13.815510557964275 --log-indel-prior -16.11809565095832 --log-artifact-prior -2.302585092994046 --normal-p-value-threshold 0.001 --min-slippage-length 8 --pcr-slippage-rate 0.1 --distance-on-haplotype 100 --long-indel-length 5 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --max-variants-per-shard 0 --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false",Version="4.2.3.0"```
c) Entire program log: Understanding the output of MMQ
Hi,
I am trying to understand the output of MMQ in my vcf file. Basically, I am looking at some public hotspot regions and wondering why they were filtered out.
GATK documentation says that median mapping quality is "
The output is an array containing, for each alt allele, the median mapping quality over all reads that best match that allele."
So in case of diploid organisms like human it should be the output for only the alternative allele or is it also for the reference allele.
https://gatk.broadinstitute.org/hc/en-us/articles/360037268011-MappingQuality
So If I use vcfpy to look at the vcf ouput and mapping qulaities as well as base qualities for each mutation
```
vcf_reader = vcfpy.Reader.from_path('mutect_filtered.vcf.gz')
print(entry.INFO['MBQ'])
print(entry.INFO['MMQ'])
```
[30, 30] [60, 60]
I was wondering if this array output for both Median base quality and Median mapping quality is for Reference and ALT bases respectively or for the tumor-normal samples (in the order that these samples are stored in the vcf output from Mutect)
-
That documentation is for GATK 4.0.0.0. Now MMQ and MBQ include all alleles, reference and alt(s).
-
So my confusion is whether the two values are for ref and alt alleles or it is for the samples.
e.g.
[30, 30] [60, 60]
is it ["Sample1 MMQ","Sample2 MMQ"] ? #in this case it is tumor-normal pair
or
is it ["REF MMQ","ALT MMQ"] ?
and same for MBQ
-
It's ref MMQ, alt MMQ, with the median taken over tumor reads only (in multi-sample mode this may include more than one tumor sample), and likewise for MBQ.
-
I checked with my g.vcf files. in the more recent version of results, MMQ=60,44,60, which contains three numbers (what's that mean?). However, in the historical version of results, MMQ=24,0, only two numbers. This make GenomicsDBImport doesn't work, and popup error message like "VCF2BinaryException : Mismatch in field length and field length descriptor: the field MMQ has 2 elements; expected 3"
-
You would get three numbers for MMQ if there are two alt alleles.
Why are you generating GVCF files?
Please sign in to leave a comment.
5 comments