get the basics for Haplotypecaller
Hello,
I am very new to GATK and not sure where to begin for my analysis.
I sequenced targeted area's of 55 different potato varieties (tetraploid, highly heterozygous) using 15 small PCR amplicons. These amplicons are 180bp small and thus, one illumina read covers almost the entire amplicon.
These amplicons were select because they contain multiple SNPs which together can discriminate between all or most alleles present in potato.
Some alleles will be shared among many of my samples, some will be more unique. Some samples will have 4 different alleles for a given amplicon, some will have just two (and perhaps in a 1:3 ratio).
I have my data assembled against a reference sequence using NGEN from DNASTAR. I have 55 bam files (one for each sample). I am able to generate SNP tables using DNASTAR's Arraystar, but haplotype calling is not possible.
My question here is: what is good literature or youtube tutorials to get familiar with such a analysis, probably using Haplotypecaller.
Any suggestions would be very appreciated.
-
Hello Maarten Nijenhuis. We have a number of resources available for coming up to speed on using our tools for VariantCalling. You can find some old recordings of our GATK workshop material that goes over our best practices here https://support.terra.bio/hc/en-us/articles/360029633732-GATK-workshop-at-BroadE-March-2019-. We also have a set of FAQ tutorials here: https://gatk.broadinstitute.org/hc/en-us/categories/36000230231.
Those links mostly goes over the basics for how our calling pipeline works rather than your specific use case organism. There are two aspects of what you describe that I suspect will cause problems, the first of which is that you are using amplicon sequencing which can cause lots of problems in our tools. You can search the forums for advice on this sort of calling but our general advice is to drop MarkDuplicates from your pipeline and run HaplotypeCaller with the extra argument:--dont-use-soft-clipped-bases true
Furthermore you should also be able to address the ployidy by adjusting the ployidy setting in HaplotypeCaller with this argument:
--ploidy 4
-
Thank you James Emery. I will start by studying these links.
Best,
Maarten
Please sign in to leave a comment.
2 comments