Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

StrandBiasBySample error Haplotypecaller

0

3 comments

  • Avatar
    David Benjamin

    The warnings are most likely benign and the output is just how GVCF format works.  When you see a ref allele and an <NON-REF> with no other alt allele it is a reference block, where HaplotypeCaller has found no variation but reports how confident is is (via the GQ) about the lack of variation.  This is useful when we combine GVCFs for joint calling because we want to know if the sample definitely has no variant or if the depth was simply insufficient etc.  The warnings occur in every reference block because there is no variant, hence no annotation can be done.

    Now, with that all said, do you intend to run in GVCF mode?

    1
    Comment actions Permalink
  • Avatar
    gubrins

    Thank you very much, I'm new with GATK and I appreacite the help!

    I'm planning to merge all my GVCF files and then run the GenotypeGVCF function, so I can get a merged VCF for all my samples. 

    I have two more questions if you don't mind, could you help me to improve the speed of HaplotypeCaller? Is going quite slowly, although I gave the task a high amount of cores. 

    The second question is a bit different. My study is with target capture sequencing and I'm interested in a specific set of genes, so I don't want to do the SNP calling based on all the chromosomes but in some genes, which I have the name and the fasta file. I've seen that I can do this with Freebayes, but Freebayes is giving me problems with the phasing. This should be the code (mainly the beginning):

    freebayes --fasta-reference /ufrc/rgenomics/share/Probe_Design/Podarcis/POR_100801/ANALYSIS/POR_100801_File1_2_3.fasta --bam-list /ufrc/rgenomics/share/Data_Analysis/TARGETseq/POR_1008/POR_100801/6_FreeBayes/6.1/list_of_bams.txt --targets /ufrc/rgenomics/share/Data_Analysis/TARGETseq/POR_1008/POR_100801/6_FreeBayes/6.2/RG_6702_Probes_noOverlap_nomito_split/pt100_RG_6702_Probes_noOverlap_nomito.bed --max-complex-gap 1 --theta 0.01 --ploidy 2 --min-alternate-fraction 0.2 --min-alternate-count 2 --min-coverage 8 --min-mapping-quality 1 --min-base-quality 20 --report-genotype-likelihood-max --no-complex --no-mnps --no-indels
     
    If you could help me with this I would really appreaciate it!!

     

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    If you are using Freebayes just because it lets you specify targets, that can be done just as well with HaplotypeCaller: -L targets.bed

    I'm afraid I can't be of any help troubleshooting Freebayes.

    How long is HaplotypeCaller taking on what interval, and what is the average depth of your samples?

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk