StrandBiasBySample error Haplotypecaller
Good afternoon,
I'm using GATK 4.1.7.0 and I'm trying to apply the Haplotypecaller function to a bam file. I was following mainly this post (https://gatk.broadinstitute.org/hc/en-us/articles/360039568932--How-to-Map-and-clean-up-short-read-sequence-data-efficiently) and from that bam file I'm trying to create the GVCF file. Here is my command:
java -jar ~/softwares/GATKK/gatk/gatk-package-4.1.7.0-local.jar HaplotypeCaller --reference Pmuralis_1.0.fa --input mergeandaligned.bam --output mergeandaligned.g.vcf.gz -A StrandBiasBySample -ERC GVCF
I was previously doing the command without the StrandBiasBySample but i saw here (https://gatkforums.broadinstitute.org/gatk/discussion/6813/several-annotations-not-working-in-gatk-haplotype-caller) that was recommended, but I still have the same problem
When I obtain the output, in the log file I have these warnings:
14:54:38.207 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
14:54:38.658 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
14:54:38.659 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
14:54:39.609 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
14:54:39.610 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
14:54:41.794 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
14:54:42.038 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
14:54:42.038 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
14:54:43.355 INFO ProgressMeter - Podmur_chrom0001:897923 0.2 3250 19476.6
Later, when I explore the obtained file, I see that for most of the observations, the ALT file is missing, and is as <NON_REF>, as you can see here:
I did not do the next step because I want to solve this first. Thank you very much!
-
The warnings are most likely benign and the output is just how GVCF format works. When you see a ref allele and an <NON-REF> with no other alt allele it is a reference block, where HaplotypeCaller has found no variation but reports how confident is is (via the GQ) about the lack of variation. This is useful when we combine GVCFs for joint calling because we want to know if the sample definitely has no variant or if the depth was simply insufficient etc. The warnings occur in every reference block because there is no variant, hence no annotation can be done.
Now, with that all said, do you intend to run in GVCF mode?
-
Thank you very much, I'm new with GATK and I appreacite the help!
I'm planning to merge all my GVCF files and then run the GenotypeGVCF function, so I can get a merged VCF for all my samples.
I have two more questions if you don't mind, could you help me to improve the speed of HaplotypeCaller? Is going quite slowly, although I gave the task a high amount of cores.
The second question is a bit different. My study is with target capture sequencing and I'm interested in a specific set of genes, so I don't want to do the SNP calling based on all the chromosomes but in some genes, which I have the name and the fasta file. I've seen that I can do this with Freebayes, but Freebayes is giving me problems with the phasing. This should be the code (mainly the beginning):
freebayes --fasta-reference /ufrc/rgenomics/share/Probe_Design/Podarcis/POR_100801/ ANALYSIS/POR_100801_File1_2_3. fasta --bam-list /ufrc/rgenomics/share/Data_ Analysis/TARGETseq/POR_1008/ POR_100801/6_FreeBayes/6.1/ list_of_bams.txt --targets /ufrc/rgenomics/share/Data_ Analysis/TARGETseq/POR_1008/ POR_100801/6_FreeBayes/6.2/RG_ 6702_Probes_noOverlap_nomito_ split/pt100_RG_6702_Probes_ noOverlap_nomito.bed --max-complex-gap 1 --theta 0.01 --ploidy 2 --min-alternate-fraction 0.2 --min-alternate-count 2 --min-coverage 8 --min-mapping-quality 1 --min-base-quality 20 --report-genotype-likelihood- max --no-complex --no-mnps --no-indels If you could help me with this I would really appreaciate it!! -
If you are using Freebayes just because it lets you specify targets, that can be done just as well with HaplotypeCaller: -L targets.bed
I'm afraid I can't be of any help troubleshooting Freebayes.
How long is HaplotypeCaller taking on what interval, and what is the average depth of your samples?
-
Hi gubrins, How you solved the problem , because I am facing the same problem.
I am getting this error, when I ran Haplotypecaller program.
01:21:39.234 INFO ProgressMeter - X:145897897 85.1 9269200 108863.8
01:21:41.384 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
01:21:43.068 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
01:21:43.069 WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
01:21:44.663 INFO HaplotypeCaller - 312144 read(s) filtered by: MappingQualityReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappedReadFilter
18850 read(s) filtered by: NotSecondaryAlignmentReadFilter
2251313 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
0 read(s) filtered by: GoodCigarReadFilter
0 read(s) filtered by: WellformedReadFilter
2582307 total reads filtered
01:21:44.663 INFO ProgressMeter - X:148821894 85.2 9279396 108867.8
01:21:44.663 INFO ProgressMeter - Traversal complete. Processed 9279396 total regions in 85.2 minutes.
01:21:44.745 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 2.3725821710000004
01:21:44.745 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 451.61692500500004
01:21:44.745 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 1779.57 sec
01:21:44.745 INFO HaplotypeCaller - Shutting down engine
[November 14, 2021 1:21:44 AM CET] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 85.24 minutes.
Runtime.totalMemory()=11671175168 -
Heys again Benjamin,
I am still getting that error and in later stages I don't have quality for the non-variant sites. Do you think this is affecting it?
-
There is no error message, just a warning. HaplotypeCaller ran just fine. Like David explained earlier, at non-variant sites, the StrandBiasBySample annotation cannot be calculated. It's not an issue, you should still have the StrandBiasBySample annotation at sites where it can be calculated.
Best,
Genevieve
-
Abrish could you make a new post for this issue since it is separate than the above topic?
-
Dear Genevieve Brandt (she/her) ,
Sure, I will do that.
Please sign in to leave a comment.
8 comments