Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Interval list and the “Mate not found” error in WES data

0

2 comments

  • Avatar
    Chris Kachulis

    Hi LY Wang,

    You are right the the documentation here is a bit unclear, and seemingly contradictory.  The solution is found in one of the comments on the other issue you linked to, here.

    With BQSR, intervals can be used while running BaseRecalibrator but not ApplyBQSR.

    In general though, your understanding is correct that we don't tend to subset to exome regions when running BQSR, but instead use the subsetting for scatter-gather parallelism (which then requires using GatherBQSRReports to combine the resulting reports into one). 

    I think that the effect of subsetting BaseRecalibrator to the targets is quite minimal, thus why we don't tend to do it.  However, if you do subset to the targets, to important point is to only subset to the targets for BaseRecalibrator, but keep all reads when you run ApplyBQSR.

    0
    Comment actions Permalink
  • Avatar
    LY Wang

    Thank you very much for your suggestion, Chris Kachulis

    I re-ran ApplyBQSR without specifying intervals and it works! No error was found after ValidateSamFile.

    I don't have further questions for now, but as a newbie in bioinformatics, I'm still hoping that GATK could update the best practice for WES analysis one day (the old one seems to be archived).

    Again, thank you for your help and all the hard work of your team!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk