Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

HaplotypeCaller complains (in 2 different ways) about sample-name

0

12 comments

  • Avatar
    Terry Jones

    BTW, I also tried using --sample-name RG_NAME (where RG_NAME is the name of the single read group in the BAM file). I get the same (second, above) error, telling me that the sample name doesn't exist.  But maybe a sample name is actually the SM tag of a RG?  I'll try that....

    0
    Comment actions Permalink
  • Avatar
    Terry Jones

    OK, looks like I have this fixed. If the @RG line contains a SM:name tag, the original error does not occur.

    Maybe the error messages above could be changed to say what sample names are found or to indicate that there were no sample names. I'd think the first error need not even be one and GATK could just assume that the BAM file contains data from one sample.

    0
    Comment actions Permalink
  • Avatar
    Till Dorendorf

    Hey Terry,

    can you please post your final solution. I seem to have the same problem, but dont really get your solution.

    Thanks and kind regards

    Till

    0
    Comment actions Permalink
  • Avatar
    Matthew Oldach

    I'm also experiencing the same error with a single BAM file which came from bowtie2 using the following:

    $ bowtie2 --threads 10 --rg ID:MTG324 --rg PL:ILLUMINA --rg SM:MTG324 -x {input.ref} -1 {input.r1} -2 {input.r2} | samtools view -Sbh -o {output}

    I use the snakemake wrapper for haplotypecaller and it works

    However, I'm trying the same on Razer3 output (rather than Bowtie2) and I get the error.

    $ gatk --java-options '' HaplotypeCaller --sample-name MTG324 -R refs/c_elegans.PRJNA13758.WS265.genomic.fa -I alignment/razers3/MTG324.sorted.dedupped.bam -ERC GVCF -O variant_calling/haplotypecaller/MTG324.vcf

    "if the @RG line contains a SM:name tag, the original error does not occur". How do I add the @RG to Razer3?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Till Dorendorf Matthew Oldach, check out our documentation on read groups

    Terry Jones do you have other solutions as well?

     

    0
    Comment actions Permalink
  • Avatar
    Naveen Eugene Louis

    Hello everyone, I think the solution would be to follow this syntax

    @RG\tID:XXX\tSM:XXX_sample

     

    Thanks Terry Jones, Your solution really helped me :)

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for posting your solution Naveen Eugene Louis to help out other users!

    0
    Comment actions Permalink
  • Avatar
    Naveen Eugene Louis

    Genevieve Brandt (she/her), Happy to help :)

    0
    Comment actions Permalink
  • Avatar
    Julianne Radford

    Hi there, 

    Naveen Eugene Louis, I am new to Linux / GATK, and have the same initial error as Terry Jones, and I am just wondering where exactly do you put this syntax: @RG\tID:XXX\tSM:XXX_sample? How do I use this in a way to fix the initial error? Thank you. 

    Julie

    0
    Comment actions Permalink
  • Avatar
    Naveen Eugene Louis

    Hi Julie,
    From what I understand, this step is basically done to assign the read group identifier, required to segregrate different samples( if you have more than 1 sample).
    However from my experience, you have to assign an Rg id even if its single sample.

    More info can be found here :https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups

    Regarding where to use the particular syntax, It actually depends on what aligner you're using

    I used Novoalign (which is a really good aligner) and used the syntax at the aligning step (right after indexing the reference genome).

    But depending on the type of aligner that you're using, you could check the command usage and assign rg tags accordingly.

    Hope this helps :)
    Good luck

    0
    Comment actions Permalink
  • Avatar
    Du Jianbin

    Hello everyone, although this question had been asked for 2 years, I think the answer still unclear for beginners.

    I'm also experiencing the same error but with a multi-sample BAM file, anyway, after one day's test, I am sure the bellow command is running well:

    ......

    -ERC GVCF \

    --sample_name AAAA \

    ......

    AAAA is one of the samples name in BAM file (comes from '@RG ID:AAAA SM:AAAA')

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Du Jianbin are you saying that HaplotypeCaller is working for you now?

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk