HaplotypeCaller complains (in 2 different ways) about sample-name
I'm using GATK v4.1.4.1 running this:
gatk --java-options -Xmx4g HaplotypeCaller -R ref.fasta -I input.bam -O output.vcf.gz
I get an error:
A USER ERROR has occurred: Argument emit-ref-confidence has a bad value: Can only be used in single sample mode currently. Use the --sample-name argument to run on a single sample out of a multi-sample BAM file.
But my input BAM file only has one reference in it. The reference name is ok, according to the SAM standard regex. There is also only a single read group.
When I repeat the above call and put --sample-name REFNAME on the command line, I get another error:
A USER ERROR has occurred: Argument --sample_name has a bad value: Specified name does not exist in input bam files
But the sample name I give on the command line matches the one in my BAM file. So on the one hand GATK is complaining that my BAM file has multiple samples (I assume that means multiple references) but on the other hand when I give it a sample name it says it doesn't exist.
I have checked my BAM file to make sure that only one reference occurs in (SAM) field 3.
BTW, regarding the second error message above, it would be useful if it 1) said --sample-name instead of --sample_name (I tried to find this in the GATK repo, but could not), 2) printed out the value it was looking for (so that I can see that it's looking for what I gave it on the command line and not something else, which could occur to due to a bug in GATK), and 3) told me what "samples" it finds in the BAM file so that I could see what GATK thinks is in there. The second and third of these suggestions would potentially make it easier for a regular user to resolve an issue like this.
Thanks for any help (and for GATK)!
-
BTW, I also tried using --sample-name RG_NAME (where RG_NAME is the name of the single read group in the BAM file). I get the same (second, above) error, telling me that the sample name doesn't exist. But maybe a sample name is actually the SM tag of a RG? I'll try that....
-
OK, looks like I have this fixed. If the @RG line contains a SM:name tag, the original error does not occur.
Maybe the error messages above could be changed to say what sample names are found or to indicate that there were no sample names. I'd think the first error need not even be one and GATK could just assume that the BAM file contains data from one sample.
-
Hey Terry,
can you please post your final solution. I seem to have the same problem, but dont really get your solution.
Thanks and kind regards
Till
-
I'm also experiencing the same error with a single BAM file which came from bowtie2 using the following:
$ bowtie2 --threads 10 --rg ID:MTG324 --rg PL:ILLUMINA --rg SM:MTG324 -x {input.ref} -1 {input.r1} -2 {input.r2} | samtools view -Sbh -o {output}
I use the snakemake wrapper for haplotypecaller and it works
However, I'm trying the same on Razer3 output (rather than Bowtie2) and I get the error.
$ gatk --java-options '' HaplotypeCaller --sample-name MTG324 -R refs/c_elegans.PRJNA13758.WS265.genomic.fa -I alignment/razers3/MTG324.sorted.dedupped.bam -ERC GVCF -O variant_calling/haplotypecaller/MTG324.vcf
"if the @RG line contains a SM:name tag, the original error does not occur". How do I add the @RG to Razer3? -
Hi Till Dorendorf Matthew Oldach, check out our documentation on read groups.
Terry Jones do you have other solutions as well?
-
Hello everyone, I think the solution would be to follow this syntax
@RG\tID:XXX\tSM:XXX_sample
Thanks Terry Jones, Your solution really helped me :)
-
Thanks for posting your solution Naveen Eugene Louis to help out other users!
-
Genevieve Brandt (she/her), Happy to help :)
-
Hi there,
Naveen Eugene Louis, I am new to Linux / GATK, and have the same initial error as Terry Jones, and I am just wondering where exactly do you put this syntax: @RG\tID:XXX\tSM:XXX_sample? How do I use this in a way to fix the initial error? Thank you.
Julie
-
Hi Julie,
From what I understand, this step is basically done to assign the read group identifier, required to segregrate different samples( if you have more than 1 sample).
However from my experience, you have to assign an Rg id even if its single sample.More info can be found here :https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups
Regarding where to use the particular syntax, It actually depends on what aligner you're using
I used Novoalign (which is a really good aligner) and used the syntax at the aligning step (right after indexing the reference genome).
But depending on the type of aligner that you're using, you could check the command usage and assign rg tags accordingly.
Hope this helps :)
Good luck -
Hello everyone, although this question had been asked for 2 years, I think the answer still unclear for beginners.
I'm also experiencing the same error but with a multi-sample BAM file, anyway, after one day's test, I am sure the bellow command is running well:
......
-ERC GVCF \
--sample_name AAAA \
......
AAAA is one of the samples name in BAM file (comes from '@RG ID:AAAA SM:AAAA')
-
Du Jianbin are you saying that HaplotypeCaller is working for you now?
Please sign in to leave a comment.
12 comments