StrandBiasBySample SB in vcf
AnsweredHello,
I am using gatk 4.1.9.0
I called gcvf for each sample using
gatk HaplotypeCaller -ERC GVCF -A StrandBiasBySample
SB information is present in gvcf, but when I called genotype using GenotypeGVCFs,
the SB information is gone from the vcf.
I tried to fix it by adding -A StrandBiasBySample
Then I get this error
WARN StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
And still no SB in the output format.
I wonder how to fix it.
Thanks
-
Official comment
The reason you are not seeing the StrandBiasBySample (SB) annotation in your final VCF after GenotypeGVCFs is that SB is an intermediate annotation for calculating FisherStrand (FS) and StrandOddsRatio (SOR). It is not an annotation anyone will see after GenotypeGVCFs. You may be able to use FS and SOR for your use case.
If you think that the SB annotation is very necessary in the final VCF, you can file an issue ticket at our github, and we can look into it. I'll make a ticket with our documentation team to add a caveat to the SB tool docs.
Thanks for the question!
Genevieve
Comment actions -
Hello,
This annotation will only work when called by HaplotypeCaller, it cannot be called by VariantAnnotator.
Please give more information about the annotation not appearing after GenotypeGVCFs, following these guidelines:
https://gatk.broadinstitute.org/hc/en-us/articles/360053424571-How-to-Write-a-Post
Best,
Genevieve
-
Hi,
I am aware that it does not work through VariantAnnotator, and I did not use it. Thanks for the reminder anyway.
So I called gvcfs on each sample using the following command:
gatk HaplotypeCaller -R $ref -I $input_bam -O ${input_bam}.g.vcf -ERC GVCF -A StrandBiasBySample
Then I load all the gvcfs into database using GenomicsDBImport.
And called vcfs using the following command:
gatk GenotypeGVCFs -R $ref -V gendb://my_database -O output.vcf.gz -A StrandBiasBySample
And that is when I got the error message:
StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
Thanks for the help!!
-
Hi - A few things that would help figure out what is going wrong:
Could you share the complete stack trace containing the error message from GenotypeGVCFs?
Could you also run SelectVariants on your GenomicsDBImport database to see if the annotation exists in your GenomicsDB?
Best,
Genevieve
-
Hi Pichu,
I encounter the same problem as you described. I failed to obtain the "SB" annotation in the VCF file. I am wondering have you solved your problem, and could you share your solutions?
Thanks!
-
Hi Yiguan Wang & Pichu,
Here is a post on the forum where this has been discussed: https://gatk.broadinstitute.org/hc/en-us/articles/360057439091-StrandBiasBySample
Is there a chance you are looking at reference blocks when you do not see SB?
If not, please share some examples so we can look into this! Also, is the annotation in the header, or just not in the samples?
Thanks,
Genevieve
-
Hi Genevieve,
I can confirm the SB annotation exists in the g.vcf file. But it disappears in the vcf file after using "GenotypeGVCFs".
Here is the example of vcf file:
https://drive.google.com/file/d/1IMPF_HoVH5L7nI3JvFoiLJInZ9IxlmRa/view?usp=sharing
Thanks
-
Hello Yiguan Wang,
Could you supply your GenotypeGVCFs command line and the entire stack trace? Did you run GenotypeGVCFs with -A StrandBiasBySample or without?
Could you also share an example of a variant before and after GenotypeGVCFs?
Thank you,
Genevieve
-
Hi Genevieve,
Whether I use " -A StrandBiasBySample" or not seems no difference on my data.
gatk GenotypeGVCFs -R dmel-all-chromosome-r5.44.fasta \
-V gatk_parent_offspring.g.vcf.gz \
-L 2R:2076823-2276823 \
-A StrandBiasBySample \
-O test_SB.vcf.gzFor example, for a variant: "2R:2272910"
Before I use the "GenotypeGVCFs", the variant in the GVCF is:
2R 2272910 . G A,<NON_REF> . . BaseQRankSum=-2.460e-01;DP=261;ExcessHet=3.01;MQRankSum=-2.950e-01;RAW_MQandDP=112393,261;ReadPosRankSum=-1.180e-01 GT:AD:DP:GQ:PL:SB ./.:9,10,0:19:99:196,0,169,223,199,422:9,0,10,0 ./.:6,5,0:11:99:114,0,120,132,135,267:5,1,4,1 ./.:6,13,0:19:95:276,0,95,294,134,428:6,0,12,1 ./.:4,10,0:14:60:211,0,60,223,90,313:4,0,10,0 ./.:4,7,0:11:69:145,0,69,157,90,247:2,2,6,1 ./.:9,14,0:23:99:403,0,160,430,202,631:9,0,9,5 ./.:6,21,0:27:74:455,0,74,473,137,611:4,2,16,5 ./.:6,11,0:17:99:232,0,101,250,134,385:5,1,10,1 ./.:5,17,0:22:64:367,0,64,382,115,497:5,0,15,2 ./.:6,15,0:21:88:452,0,88,470,133,604:6,0,12,3 ./.:7,10,0:17:99:298,0,130,319,160,479:5,2,10,0 ./.:6,8,0:14:99:162,0,114,180,138,318:6,0,7,1 ./.:9,10,0:19:99:313,0,171,340,201,541:8,1,8,2 ./.:7,13,0:20:99:390,0,120,411,159,570:6,1,11,2
"SB" annotation exists in the GVCF file.
But after I run "GenotypeGVCFs" using the above command, the variant in the VCF file becomes:
2R 2272910 . G A 4023.42 . AC=14;AF=0.500;AN=28;BaseQRankSum=-2.460e-01;DP=261;ExcessHet=36.8993;FS=2.489;InbreedingCoeff=-1.0000;MLEAC=14;MLEAF=0.500;MQ=20.75;MQRankSum=-2.950e-01;QD=15.84;ReadPosRankSum=-1.180e-01;SOR=0.462 GT:AD:DP:GQ:PL 0/1:9,10:19:99:196,0,169 0/1:6,5:11:99:114,0,120 0/1:6,13:19:95:276,0,95 0/1:4,10:14:60:211,0,60 0/1:4,7:11:69:145,0,69 0/1:9,14:23:99:403,0,160 0/1:6,21:27:74:455,0,74 0/1:6,11:17:99:232,0,101 0/1:5,17:22:64:367,0,64 0/1:6,15:21:88:452,0,88 0/1:7,10:17:99:298,0,130 0/1:6,8:14:99:162,0,114 0/1:9,10:19:99:313,0,171 0/1:7,13:20:99:390,0,120
The "SB" annotation disappeared.
The log file:
Using GATK jar /data/home/xxxx/miniconda3/envs/snakemake/share/gatk4-4.2.0.0-0/gatk-package-4.2.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /data/home/xxxx/miniconda3/envs/snakemake/share/gatk4-4.2.0.0-0/gatk-pa
ckage-4.2.0.0-local.jar GenotypeGVCFs -R /data/home/xxxx/myData/PeterData/reference_genome/dmel_r5.44/dmel-all-chromosome-r5.44.fasta -V gatk_parent_offspring.g.vcf.gz -L 2R:2076823-2276823 -A StrandBiasBySample -O test_SB.vcf.gz
19:55:56.319 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/data/home/xxxx/miniconda3/envs/snakemake/share/gatk4-4.2.0.0-0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Apr 27, 2021 7:55:56 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
19:55:56.406 INFO GenotypeGVCFs - ------------------------------------------------------------
19:55:56.407 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.2.0.0
19:55:56.407 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
19:55:56.407 INFO GenotypeGVCFs - Executing as xxxx@xxxxxxx.uk on Linux v3.10.0-1160.11.1.el7.x86_64 amd64
19:55:56.407 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_152-release-1056-b12
19:55:56.407 INFO GenotypeGVCFs - Start Date/Time: 27 April 2021 19:55:56 BST
19:55:56.407 INFO GenotypeGVCFs - ------------------------------------------------------------
19:55:56.407 INFO GenotypeGVCFs - ------------------------------------------------------------
19:55:56.407 INFO GenotypeGVCFs - HTSJDK Version: 2.24.0
19:55:56.407 INFO GenotypeGVCFs - Picard Version: 2.25.0
19:55:56.407 INFO GenotypeGVCFs - Built for Spark Version: 2.4.5
19:55:56.407 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
19:55:56.407 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
19:55:56.407 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
19:55:56.407 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
19:55:56.407 INFO GenotypeGVCFs - Deflater: IntelDeflater
19:55:56.408 INFO GenotypeGVCFs - Inflater: IntelInflater
19:55:56.408 INFO GenotypeGVCFs - GCS max retries/reopens: 20
19:55:56.408 INFO GenotypeGVCFs - Requester pays: disabled
19:55:56.408 INFO GenotypeGVCFs - Initializing engine
19:55:56.673 INFO FeatureManager - Using codec VCFCodec to read file file:///data/home/xxxx/myData/PeterData/all_14samples_r5.44/gatk_parent_offspring.g.vcf.gz
19:55:56.709 INFO IntervalArgumentCollection - Processing 200001 bp from intervals
19:55:56.711 INFO GenotypeGVCFs - Done initializing engine
19:55:56.734 INFO ProgressMeter - Starting traversal
19:55:56.734 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
19:55:56.942 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2077013 and possibly subsequent; alleleLikelihoodMap is null
19:55:56.942 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2077013 and possibly subsequent; alleleLikelihoodMap is null
19:55:56.942 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2077013 and possibly subsequent; alleleLikelihoodMap is null
19:55:56.942 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2077013 and possibly subsequent; alleleLikelihoodMap is null
19:55:56.942 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2077013 and possibly subsequent; alleleLikelihoodMap is null
19:55:56.942 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2077013 and possibly subsequent; alleleLikelihoodMap is null
19:55:56.942 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2077013 and possibly subsequent; alleleLikelihoodMap is null
19:55:56.942 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2077013 and possibly subsequent; alleleLikelihoodMap is null
19:55:56.942 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2077013 and possibly subsequent; alleleLikelihoodMap is null
19:55:56.942 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2077013 and possibly subsequent; alleleLikelihoodMap is null
...
19:56:00.689 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2272910 and possibly subsequent; alleleLikelihoodMap is null
19:56:00.689 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2272910 and possibly subsequent; alleleLikelihoodMap is null
19:56:00.689 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2272910 and possibly subsequent; alleleLikelihoodMap is null
19:56:00.689 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2272910 and possibly subsequent; alleleLikelihoodMap is null
19:56:00.689 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2272910 and possibly subsequent; alleleLikelihoodMap is null
19:56:00.689 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2272910 and possibly subsequent; alleleLikelihoodMap is null
19:56:00.689 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2272910 and possibly subsequent; alleleLikelihoodMap is null
19:56:00.702 INFO ProgressMeter - 2R:2273179 0.1 120273 1819102.6
19:56:00.702 INFO ProgressMeter - Traversal complete. Processed 120273 total variants in 0.1 minutes.
19:56:00.706 INFO GenotypeGVCFs - Shutting down engine
[27 April 2021 19:56:00 BST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.07 minutes.
Runtime.totalMemory()=3176136704Thanks,
Yiguan
-
Yiguan Wang Thank you! I will look into this further.
Please sign in to leave a comment.
10 comments