emit-ref-confidence error using single sample BAM
Answereda) GATK version used: 4.1.7.0
b) Exact command used:
gatk HaplotypeCaller -R final.fasta -I ${sampleName}_markdup.bam -O /${sampleName}.g.vcf.gz -ERC GVCF
c) Entire error log:
A USER ERROR has occurred: Argument emit-ref-confidence has a bad value: Can only be used in single sample mode currently. Use the --sample-name argument to run on a single sample out of a multi-sample BAM file.
Hi there,
I posted last week (https://gatk.broadinstitute.org/hc/en-us/community/posts/360077176211-HaplotypeCaller-incompatible-contigs-one-scaffold-only-) about a problem I was having with incompatible contigs while using HaplotypeCaller.
I used samtools to remove the contig from my BAM, and an awk command to remove the contig from the reference genome, as HaplotypeCaller returned the same error when it was included in the reference.
I’m now getting the error I posted above, however the BAM I’m using is a single sample bam (I’ve included the only @PG line from the bam header below).
@PG ID:bwa PN:bwa VN:0.7.16a-r1181 CL:bwa mem -t 16 final.fasta.gz sampleName_filtered_R1_.fastq.gz sampleName_filtered_R2_.fastq.gz
Can anyone offer any advice?
Thank you!
-
Official comment
Read groups are necessary for using GATK, you may just need to add them to your file. Usually read groups are added during alignment, but you can add them to your BAM with AddOrReplaceReadGroups. This document has more information about how to do that: https://gatk.broadinstitute.org/hc/en-us/articles/360035532352-Errors-about-read-group-RG-information
Hope this helps!
Best,
Genevieve
Comment actions -
Hi there, is anyone able to help on this?
-
Hi suzy_bunters, yes we will get to this as soon as we are able. Please see our support policy. I think this problem has been addressed on the forum before, so I would recommend going through other forum posts if you want a solution ASAP.
-
Hi Genevieve, thanks for your reply (and sorry to nag) :D I checked the forum before posting but the other solutions I found don't work for me. I'll be more patient though!
-
Hi suzy_bunters, could you print out the read group lines in the BAM header? You can see this doc for more information: https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups
-
Hi Genevieve Brandt (she/her),
There are no @RG lines in the header (the headers either start with @HD, @SQ, or @PG).
The original fasta files from which the bam was generated were trimmed using Trimmomatic - would that have removed the read group lines?
-
I also encountered the same error and my sorted bam files do not have @RG lines.
I was given trimmed fastq files (2 years old) that I aligned with the latest reference genome using BWA and generated .sam, .bam, and sorted.bam files using Samtools. First time using GATK for gVCF and VCF generation.
bwa mem susScr11.fasta SV11_R1.fastq.gz SV11_R2.fastq.gz > SV11.sam
samtools view -S -b SV11.sam > SV11.bamsamtools sort SV11.bam -o SV11.s.bam
samtools index SV11.s.bam
java -jar gatk-package-4.2.0.0-local.jar HaplotypeCaller -R susScr11.fasta -I SV11.s.bam -O SV11.g.vcf.gz -ERC GVCFA USER ERROR has occurred: Argument emit-ref-confidence has a bad value: Can only be used in single sample mode currently. Use the --sample-name argument to run on a single sample out of a multi-sample BAM file.
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace. -
Thank you and it worked. I have one more question. I see way too many SNPs in my final vcf (after the genotype call) in IGV. I see SNPs that are only in one of the samples in either case or control group that I should ignore by removing them. Is there a GATK function to remove such SNPs or do I have to come up with my own code for it? Thank you.
-
Hi Alia Parveen, glad to hear that it worked!
For the next question, make sure you are doing filtering. You can check out our best practices here. You can also search through the forum for other users with a similar question to see how they refined their variants. If you are not able to figure out a solution, make a new post on the forum since it is a different question.
Best,
Genevieve
-
I hate to revive this answered question, but I'm running into this same error now, and I'm finding the instructions for adding read group information to be ambiguous. The documentation about read groups states: "When multiplexing is involved, then each subset of reads originating from a separate library run on that lane will constitute a separate read group," and "In Illumina data, read group IDs are composed using the flowcell name and lane number". My data comes from 70 libraries/samples that were pooled and run on one Illumina flowcell, in a single lane. So which is it? According to the first statement, each library/sample should have a unique ID, but according to the second statement, they should all have the same ID field.
How do HaplotypeCaller and other downstream tools differentiate between samples/libraries after VCF files are eventually merged? Should I give each of my 70 samples a unique ID, LB, or SM field, or some combination of the above? Genevieve Brandt (she/her) are you able to clarify?
-
Spencer Monckton I think for your case, each library/sample should get a unique identifier (ID). Different samples need to be separated, and the different libraries should be separated as well.
Samples will be identified by the SM sample name.
Please sign in to leave a comment.
11 comments