Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

emit-ref-confidence error

Answered
0

15 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Florence Morel

    Does your BAM file contain multiple samples? HaplotypeCaller is meant to be run on one sample. You can use the --sample-name argument to select the sample from your BAM.

    Hope this helps!

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Florence Morel

    Thank you Genevieve for your quick answer!

    Where can I find the sample name and how should I write the command line with the --sample-name argument?

    Could you provide me an example?

    Thank you +++

    And best wishes for the new year!

     

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Florence,

    I just noticed a possible issue in your command, you have -ERC "GVCF" while it should be written as just -ERC GVCF. Try that and see if it solves your issue.

    More information at the HaplotypeCaller documentation page: https://gatk.broadinstitute.org/hc/en-us/articles/360050814612-HaplotypeCaller#--emit-ref-confidence

    Example Command:

    gatk --java-options "-Xmx4g" HaplotypeCaller  \
       -R Homo_sapiens_assembly38.fasta \
       -I input.bam \
       -O output.g.vcf.gz \
       -ERC GVCF

    Happy New Year!

    0
    Comment actions Permalink
  • Avatar
    Julianne Radford

    Hi Florence Morel

    Did you ever get an answer to this question: "Where can I find the sample name and how should I write the command line with the --sample-name argument?"

    I am having the same issue.

    Thanks, 

    Julie

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Julianne Radford,

    I can give you an example for the sample name argument!

    The sample names should be in your read groups, you can find more information at our documentation page here: https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups

    gatk --java-options "-Xmx4g" HaplotypeCaller  \
       -R Homo_sapiens_assembly38.fasta \
       -I input.bam \
       -O output.g.vcf.gz \
       -ERC GVCF
    --sample-name name_of_sample

    Hope this helps!

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Julianne Radford

    Hi Genevieve Brandt (she/her)

    Thank you so much for the feedback.

    First question: do I have to include this bit in my HaplotypeCaller code? Because I haven't been as it returns an error when I try to use it. 

    --java-options "-Xmx4g"

    Second, my bam file only contains one sample, but when I use the instructions from the Read Group page:

    samtools view -H SRR5341585.bam | grep '@RG' 

    to find out the sample name, my shell returns the error:

    E26: Hebrew cannot be used: Not enabled at compile time 

    However, I in my most recent effort, I used the --java-options "Xmx4g", and tried using the name of the BAM file which SHOULD be the sample name in the --sample-name name_of_sample argument, and now I am getting a new error:

    A USER ERROR has occurred: 'HaplotypeCaller-R' is not a valid command

    And I guess here the -R is referring to where I am putting in my reference file.

    Any ideas? Thanks once again, 

    Julianne

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Julianne Radford,

    Go ahead and make a new post regarding these issues so we can look into solving them. 

    Thank you,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    ngonza27 ngonza27

    Can I ask what exactly is needed in the --sample-name argument?

    I'm struggling because my samples were split across lanes. I ran the alignments on those split samples, and sorted the individual alignments before using samtools to merge them into a sample_merged.bam file.

    samtools merge -@ ${THREADS} ${SAMPLE}/${SAMPLE}_merged.bam ${SAMPLE}/*_sorted.bam

    Now I'm getting the same error message described above:

    Argument emit-ref-confidence has a bad value: Can only be used in single sample mode currently. Use the --sample-name argument to run on a single sample out of a multi-sample BAM file.

    In this case is --sample-name passed the SM information in the @RG? Currently my merged .bam files contains a read group for each sample that was merged. Will HaplotypeCaller only work on 1 of the samples in that merged file?

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi ngonza27 ngonza27,

    Yes, if you are running HaplotypeCaller with -ERC GVCF, it will only run on one sample at a time. Our recommended joint calling pipeline involves running HaplotypeCaller on each sample individually, then consolidating the GVCFs with GenomicsDBImport or CombineGVCFs, and joint genotyping with GenotypeGVCFs.

    Here an article regarding joint calling: https://gatk.broadinstitute.org/hc/en-us/articles/360035890431-The-logic-of-joint-calling-for-germline-short-variants

    And our joint calling pipeline for germline SNPs and Indels: https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-

    Hope this helps!

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Marco Hoog

    Not sure what to do get past this error other than finding another tool somewhere else.

    I don't have a @RG line in my bam or any mention of a "sample".

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Marco Hoog, thanks for posting here! We can help out. You should be able to get past this issue if you add read groups to your samples, which are necessary for running GATK. There is more information about adding read groups at this troubleshooting document: Errors about read group (RG) information.

    0
    Comment actions Permalink
  • Avatar
    Juan Pablo Aguilar Cabezas

    Hi, I am having the same issue, I found that I needed to add the @RG read group information but I am trying and I have found that some of the documents provide incomplete information. For example, I went to this page which was recommended before - https://gatk.broadinstitute.org/hc/en-us/articles/360035532352, and I ran the code and I got an error: "ERROR: Option 'RGPU' is required.", which is not mentioned there.

    I have processed just a few samples and I would like to know which information must be part of the @RG. I would like to start everything well from the beginning/mapping the reads, instead of having to fix this later.
    I use bwa-mem, and one can add the read group information <<-R STR read group header line such as '@RG\tID:foo\tSM:bar' [null]>>.

    Thank you.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for the feedback Juan Pablo Aguilar Cabezas! I'll submit a documentation request with our team to update that article with all the required read group fields.

    There is a better description of the required fields in this article here: https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups

    0
    Comment actions Permalink
  • Avatar
    Juan Pablo Aguilar Cabezas

    Genevieve Brandt (she/her) Do you know how to add it for the mapping step?
    I am trying the AddOrReplaceReadGroups (Picard) but since there is no multi-threading option it for one sample it is taking almost 2 hours, the time it takes me to do the mapping-bam conversion and MarkDuplicate steps.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Yes, you can add the read group fields in the order you can see in that read group doc I shared above! The command will be following the description from the bwa docs, '@RG\tID:foo\tSM:bar'. The '\t' inserts tabs and the different fields should replace ID and SM with the values for those fields.

    @RG ID:H0164.2  PL:illumina PU:H0164ALXX140820.2    LB:Solexa-272222    PI:0    DT:2014-08-20T00:00:00-0400 SM:NA12878  CN:BI
    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk