Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Base Quality Score Recalibration doubt

0

4 comments

  • Avatar
    Gökalp Çelik

    Hi Paulo Ricardo

    Known sites file must be a variant file which contains representative variants for the species that you are working on. If this is from a human sample our resource bundle contains known sites for BQSR and VQSR and some other tools that require it. If you are working with a non-model organism we recommend performing bootstrapping for BQSR which you may need to collect high quality variant sites from many of the samples and use that collection of variants for all your bam files to perform multiple levels of BQSR and variant calling steps until reaching a convergence for basecalling scores and variant calls. If you only use the variants to perform BQSR on the same sample it will not result in a proper convergence and each of your samples will produce a different level of recalibration as a result of self-recalibration.

    I hope this helps. 

    1
    Comment actions Permalink
  • Avatar
    Paulo Ricardo

    hi Gökalp Çelik

    And to detect the somatic variants, how do I perform bootstrapping? Do I first call the variants with haplotype caller or mutect2? Because I called with mutect2 and tried to perform hard-filtering to get the most truthful variants. However it didn't worked using these parameters, since I had no QD, etc. when running on Tumor only. So my question is do I calibrate the bam files with the variants called with haplotype caller and only then I run Mutect2 on the calibrated bam files to obtain the somatic variants

     -V snps.vcf.gz \
        -filter "QD < 2.0" --filter-name "QD2" \
        -filter "QUAL < 30.0" --filter-name "QUAL30" \
        -filter "SOR > 3.0" --filter-name "SOR3" \
        -filter "FS > 60.0" --filter-name "FS60" \
        -filter "MQ < 40.0" --filter-name "MQ40" \
        -filter "MQRankSum < -12.5" --filter-name "MQRankSum-12.5" \
        -filter "ReadPosRankSum < -8.0" --filter-name "ReadPosRankSum-8" \
        -O snps_filtered.vcf.gz

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again.

    Zygosity of your variants is not important. The only important thing is to get high quality germline sites. You can use HaplotypeCaller for that. 

     

    1
    Comment actions Permalink
  • Avatar
    Paulo Ricardo

    Thank you very much for your help!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk