Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Fungal mitochondrial variant calling

0

5 comments

  • Avatar
    Yue Wang

    Another thing I notice is that mitochondria have very limited variants in their genomes. This species's mitogenome sizes are around 40kb.

    I can only found several variants in DRR022915 compared to AF293 with HaplotypeCaller. In this case, is Base Recalibration recommended? 

     

    0
    Comment actions Permalink
  • Avatar
    Megan Shand

    Hi,

    I don't have any experience with fungal mitochondria samples, but I'll try to address what to look out for in choosing these tools.

    For the bootstrapping of BQSR, the key is that you want to have an overwhelming amount of sites that are truly non-variant compared to the number of sites you have that might have variation. For the human genome this is achieved easily by removing common variant sites. Depending on the size of the fungal mitochondria and how polymorphic you expect it to be you might only need to bootstrap with one of your samples, or you could use more. 

    For the variant calling: do you know the ploidy of the fungal mitochondria or is the copy number so high (as it is in humans) that you expect to see various low allele fractions at variant sites? We use Mutect2 for the mitochondria calling because we expect the ploidy is so high that it doesn't make sense to use HaplotypeCaller and we expect low allele fraction variants (such as 5% of the reads having an alternate allele for example).

    If you do choose to use the full mitochondria pipeline it includes a shifted (or rotated) reference. If the fungal mitochondria is circular and you are concerned about the quality of calls near an artificial breakpoint in the reference, then it might be worth generating shifted reference files.  We are working on tools to allow you to do that easily, but they are unfortunately not available yet. If you don't expect the edge of the reference to be highly variable or low coverage, then you might be able to get away with just running the reference as is with the pipeline you've described.

    Hopefully that helps a bit! I think you'll have to do a good deal of exploring your particular dataset to be sure that the tools are working for this use case. Best of luck!

    0
    Comment actions Permalink
  • Avatar
    Megan Shand

    Sorry, I didn't see your last comment before I replied! If these samples don't have many variants, then you can still run BQSR, but without removing many known variant sites (since there aren't that many). Since most of the data should match the reference there will be enough data to train the BQSR model without worrying about real variant sites. Or you could just remove the sites you've already found without worrying about using all of the samples.

    0
    Comment actions Permalink
  • Avatar
    Yue Wang

    Hi Megan,

    Thanks for the detailed reply.

    I will try to run BQSR.

    For the fungal mitogenome, it usually has thousands of copies, some introns, and heteroplasmy could exist. I think Mutect2 mitochondria mode fits my requirement.

    Another question that haunted me is which preprocess is preferred. With the high copy number of mitogenome, is it necessary to trim the reads or to clean the data as this article (https://gatk.broadinstitute.org/hc/en-us/articles/360039568932--How-to-Map-and-clean-up-short-read-sequence-data-efficiently) talks about? Can I just skip it?

    Thanks~

    0
    Comment actions Permalink
  • Avatar
    Megan Shand

    I think trimming adapters depends more on the sequencing methods than the fact that it is a fungal mitogenome. If you choose to skip that step I'd just be sure to look at your bams in IGV to make sure there isn't excessive adapter sequence from short inserts. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk