Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

HaplotypeCaller for haploid genome

0

7 comments

  • Avatar
    Genevieve Brandt

    Hello Pummi Singh, <NON_REF> occurs in GVCFs. You can find more information here: GVCF - Genomic Variant Call Format

    I do not see any issues in your error log but it also does not look like the complete stack trace.

    0
    Comment actions Permalink
  • Avatar
    Pummi Singh

    Thank you Genevieve. Why is the header of the last column of my output.g.vcf "20" and not the sample name? Every time I run a different sample, this is the case and I am unable to use GenomicDBImport for this reason (duplicate column error). 

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    Hi Pummi Singh, please post your entire stack trace to look for errors.

    I would also recommend running ValidateSamFile on your input BAM (P1.bam) to check for issues with the file. You may also want to read this document about read groups because incorrect read groups will lead to issues.

    0
    Comment actions Permalink
  • Avatar
    Pummi Singh

    Hi Genevieve,

    This comment box does not allow me to post the entire stack trace and I cannot upload anything but a jpg file. 

    Thanks for suggesting Validatesamfile. I tried that and I have the following error:

    ## HISTOGRAM java.lang.String
    Error Type Count
    ERROR:MATE_CIGAR_STRING_INVALID_PRESENCE 135117
    ERROR:MATE_NOT_FOUND 1278150

    My bam file is indexed, sorted, fixmated, duplicates removed and reads tagged. I am not sure why do I have this error and if I should just ignore it. Please suggest. Thanks

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    The tutorial I linked to above has some more information on these errors. They usually need to be fixed before use with GATK tools.

    I will need to see the entire HaplotypeCaller error log to determine if a problem exists there. You can also search the stack trace for errors or warnings to determine if any caused the sample ID to appear as "20".

    Did you check your read groups? Is the SM "20" in those?

    0
    Comment actions Permalink
  • Avatar
    Pummi Singh

    Yes, the SM was indeed "20" in the read groups and I have now fixed this issue. I can see sample names finally. Thanks a ton.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    Glad you fixed it! Thanks for updating with your solution.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk