Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Base Quality Score Recalibration (BQSR) Follow

8 comments

  • Avatar
    Nickier

    How to set a COMPRESSION_LEVEL of ApplyBQSR, I found that the output  bam file is twice the size of the original bam file while the the original bam is COMPRESSION_LEVEL=2

     
    0
    Comment actions Permalink
  • Avatar
    cali

    gatk ApplyBQSR \
    --java-options "-Xmx6G -Dsamjdk.compression_level=5" \
    -R $ref \
    -I $bam_in \
    --bqsr-recal-file $table \
    -L $contig \
    -O $bam_out

    0
    Comment actions Permalink
  • Avatar
    Yiguan Wang

    Currently working on Drosophila genomes, there isn't a known list of variants. Just wondering how to perform bootstrap to generate a set of known variants, is there a pipeline about that? Thanks in advance!

    3
    Comment actions Permalink
  • Avatar
    Adrián Segura

    Hi I have a question. In the description of this process, the BaseRecalibrator tool requires databases of known polymorphisms to recalibrate the quality of the bases. As explained in this document, any changes with respect to these references (dbSNP, gnomAD, ...) are considered an error, is this not counterproductive for the detection of somatic variants in tumor samples? Shouldn't I then provide in BaseRecalibrator also data from COSMIC or some other specialized databases on somatic mutations?

    1
    Comment actions Permalink
  • Avatar
    Joanna

    Hi all, 

    I have a question if in the case of Canis lupus familiaris (DOG) the BQSR is needed?

    Thanks in advance!

    Joanna

    0
    Comment actions Permalink
  • Avatar
    Sophie Agger

    Joanna yes, it's not related to species.

    0
    Comment actions Permalink
  • Avatar
    Sophie Agger

    I've had a technical issue with this tool. If your disk is full, it doesn't throw an error, but just keeps chugging along. In most cases you'd notice this due to lack of EOL, but in theory this could lead to a truncated bam-file where you can't see that it's truncated, plus it's a lot of work to fix manually. Is this a known bug or is it just something I'll have to live with?

    0
    Comment actions Permalink
  • Avatar
    Conrad Leonard

    Adrián Segura I think the assumption is that for most tumours the number of positions affected by somatic variation is negligible compared to the total size of the genome so they won't affect the bulk statistics much. But I do wonder about tumours with high TMB and especially those with a distinctive mutational signature e.g. UV for melanoma, where somatic variation is highly correlated with base context. One could imagine in that case for some bins that a non-negligible proportion of the 'error' in the bin is real variation, which would lead to improper downwards base quality recalibration at the exact sites where you want to call. Geraldine Van der Auwera is there guidance on this from GATK team? Maybe we could do some experiments...

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk