VQSR dataset preparation
AnsweredHi,
My information are :
a) GATK version used: gatk-4.2.0.0
b) Exact command used:
gatk -Xmx40g VariantRecalibrator \
-R $genome \
-V cohort.vcf.gz \
-an QD -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an SOR -an DP \
-mode SNP \
--resource:test1,known=false,training=true,truth=true,prior=10 test1.vcf.gz \
--resource:CMJ,known=false,training=true,truth=true,prior=10 CMJ.vcf.gz \
--resource:dbsnp,known=true,training=false,truth=false,prior=2 dbsnp.vcf.gz \
-O cohort_snps.recal \
--tranches-file cohort_snps.tranches \
--rscript-file output.plots.R
gatk -Xmx40g VariantRecalibrator \
-R $genome \
-V cohort.vcf.gz \
-an QD -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an SOR -an DP \
-mode INDEL\
--resource:test1,known=false,training=true,truth=true,prior=10 test1.vcf.gz \
--resource:CMJ,known=false,training=true,truth=true,prior=10 CMJ.vcf.gz \
--resource:dbsnp,known=true,training=false,truth=false,prior=2 dbsnp.vcf.gz \
-O cohort_snps.recal \
--tranches-file cohort_snps.tranches \
--rscript-file output.plots.R
As I have large datasets of confident variants, I do not need to use hard filtering. I am using vqsr directly.
If not an error, choose a category for your question(REQUIRED):
a) How do I prepare my training datasets for SNP and INDELS:
- do I need to create training datasets with INDELS only for the mode INDELS?
- do I need to create training datasets with SNPs only for mode SNPs?
Same question for the query cohort.vcf.gz? Do I need to separate INDELs and SNPs to run each mode separatly?
I have tried the run mode SNPs with all SNPs and INDELs for query and datasets and I do no t have an error, the snp.recalibrated.vcf.gz is produced. But I am wondering if I used the right method.
Thank you very much in advance for your answer.
Best regards,
Sabrina
-
Hi Sabrina,
Please read these articles regarding how to run VQSR:
- https://gatk.broadinstitute.org/hc/en-us/articles/360035531112--How-to-Filter-variants-either-with-VQSR-or-by-hard-filtering
- https://gatk.broadinstitute.org/hc/en-us/articles/360035531612-Variant-Quality-Score-Recalibration-VQSR-
Let me know if these do not answer your questions.
Best,
Genevieve
-
sabrina legoueix we have released a new article that covers your questions even more in depth: https://gatk.broadinstitute.org/hc/en-us/articles/4402736812443-Which-training-sets-arguments-should-I-use-for-running-VQSR-
Please sign in to leave a comment.
2 comments