Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

StrandBiasBySample SB in vcf

Answered
0

10 comments

  • Official comment
    Avatar
    Genevieve Brandt (she/her)

    The reason you are not seeing the StrandBiasBySample (SB) annotation in your final VCF after GenotypeGVCFs is that SB is an intermediate annotation for calculating FisherStrand (FS) and StrandOddsRatio (SOR). It is not an annotation anyone will see after GenotypeGVCFs. You may be able to use FS and SOR for your use case.

    If you think that the SB annotation is very necessary in the final VCF, you can file an issue ticket at our github, and we can look into it. I'll make a ticket with our documentation team to add a caveat to the SB tool docs.

    Thanks for the question!

    Genevieve

    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hello,

    This annotation will only work when called by HaplotypeCaller, it cannot be called by VariantAnnotator.

    Please give more information about the annotation not appearing after GenotypeGVCFs, following these guidelines:

    https://gatk.broadinstitute.org/hc/en-us/articles/360053424571-How-to-Write-a-Post

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    羅Pichu

    Hi,

    I am aware that it does not work through VariantAnnotator, and I did not use it. Thanks for the reminder anyway.

    So I called gvcfs on each sample using the following command:

    gatk HaplotypeCaller -R $ref -I $input_bam -O ${input_bam}.g.vcf -ERC GVCF -A StrandBiasBySample

    Then I load all the gvcfs into database using GenomicsDBImport.

    And called vcfs using the following command:

    gatk GenotypeGVCFs -R $ref -V gendb://my_database -O output.vcf.gz -A StrandBiasBySample

    And that is when I got the error message:

    StrandBiasBySample - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null

     

    Thanks for the help!!

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi - A few things that would help figure out what is going wrong:

    Could you share the complete stack trace containing the error message from GenotypeGVCFs?

    Could you also run SelectVariants on your GenomicsDBImport database to see if the annotation exists in your GenomicsDB?

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Yiguan Wang

    Hi Pichu,

    I encounter the same problem as you described. I failed to obtain the "SB" annotation in the VCF file. I am wondering have you solved your problem, and could you share your solutions?

    Thanks!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Yiguan Wang & Pichu, 

    Here is a post on the forum where this has been discussed: https://gatk.broadinstitute.org/hc/en-us/articles/360057439091-StrandBiasBySample

    Is there a chance you are looking at reference blocks when you do not see SB? 

    If not, please share some examples so we can look into this! Also, is the annotation in the header, or just not in the samples?

    Thanks,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Yiguan Wang

    Hi Genevieve,

    I can confirm the SB annotation exists in the g.vcf file. But it disappears in the vcf file after using "GenotypeGVCFs". 

    Here is the example of vcf file:

    https://drive.google.com/file/d/1IMPF_HoVH5L7nI3JvFoiLJInZ9IxlmRa/view?usp=sharing

     

    Thanks

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hello Yiguan Wang,

    Could you supply your GenotypeGVCFs command line and the entire stack trace? Did you run GenotypeGVCFs with -A StrandBiasBySample or without?

    Could you also share an example of a variant before and after GenotypeGVCFs?

    Thank you,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Yiguan Wang

    Hi Genevieve,

    Whether I use  " -A StrandBiasBySample" or not seems no difference on my data.  

    gatk GenotypeGVCFs -R dmel-all-chromosome-r5.44.fasta \
    -V gatk_parent_offspring.g.vcf.gz \
    -L 2R:2076823-2276823 \
    -A StrandBiasBySample \
    -O test_SB.vcf.gz

     

    For example, for a variant: "2R:2272910"

    Before I use the "GenotypeGVCFs", the variant in the GVCF is:

    2R      2272910 .       G       A,<NON_REF>     .       .       BaseQRankSum=-2.460e-01;DP=261;ExcessHet=3.01;MQRankSum=-2.950e-01;RAW_MQandDP=112393,261;ReadPosRankSum=-1.180e-01     GT:AD:DP:GQ:PL:SB       ./.:9,10,0:19:99:196,0,169,223,199,422:9,0,10,0     ./.:6,5,0:11:99:114,0,120,132,135,267:5,1,4,1   ./.:6,13,0:19:95:276,0,95,294,134,428:6,0,12,1  ./.:4,10,0:14:60:211,0,60,223,90,313:4,0,10,0   ./.:4,7,0:11:69:145,0,69,157,90,247:2,2,6,1     ./.:9,14,0:23:99:403,0,160,430,202,631:9,0,9,5      ./.:6,21,0:27:74:455,0,74,473,137,611:4,2,16,5  ./.:6,11,0:17:99:232,0,101,250,134,385:5,1,10,1 ./.:5,17,0:22:64:367,0,64,382,115,497:5,0,15,2  ./.:6,15,0:21:88:452,0,88,470,133,604:6,0,12,3  ./.:7,10,0:17:99:298,0,130,319,160,479:5,2,10,0     ./.:6,8,0:14:99:162,0,114,180,138,318:6,0,7,1   ./.:9,10,0:19:99:313,0,171,340,201,541:8,1,8,2  ./.:7,13,0:20:99:390,0,120,411,159,570:6,1,11,2

    "SB" annotation exists in the GVCF file.

    But after I run "GenotypeGVCFs" using the above command, the variant in the VCF file becomes:

    2R      2272910 .       G       A       4023.42 .       AC=14;AF=0.500;AN=28;BaseQRankSum=-2.460e-01;DP=261;ExcessHet=36.8993;FS=2.489;InbreedingCoeff=-1.0000;MLEAC=14;MLEAF=0.500;MQ=20.75;MQRankSum=-2.950e-01;QD=15.84;ReadPosRankSum=-1.180e-01;SOR=0.462      GT:AD:DP:GQ:PL  0/1:9,10:19:99:196,0,169        0/1:6,5:11:99:114,0,120 0/1:6,13:19:95:276,0,95 0/1:4,10:14:60:211,0,60 0/1:4,7:11:69:145,0,69  0/1:9,14:23:99:403,0,160        0/1:6,21:27:74:455,0,74 0/1:6,11:17:99:232,0,101   0/1:5,17:22:64:367,0,64  0/1:6,15:21:88:452,0,88 0/1:7,10:17:99:298,0,130        0/1:6,8:14:99:162,0,114 0/1:9,10:19:99:313,0,171        0/1:7,13:20:99:390,0,120

     

    The "SB" annotation disappeared. 

    The log file:

    Using GATK jar /data/home/xxxx/miniconda3/envs/snakemake/share/gatk4-4.2.0.0-0/gatk-package-4.2.0.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /data/home/xxxx/miniconda3/envs/snakemake/share/gatk4-4.2.0.0-0/gatk-pa
    ckage-4.2.0.0-local.jar GenotypeGVCFs -R /data/home/xxxx/myData/PeterData/reference_genome/dmel_r5.44/dmel-all-chromosome-r5.44.fasta -V gatk_parent_offspring.g.vcf.gz -L 2R:2076823-2276823 -A StrandBiasBySample -O test_SB.vcf.gz
    19:55:56.319 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/data/home/xxxx/miniconda3/envs/snakemake/share/gatk4-4.2.0.0-0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Apr 27, 2021 7:55:56 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    19:55:56.406 INFO GenotypeGVCFs - ------------------------------------------------------------
    19:55:56.407 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.2.0.0
    19:55:56.407 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
    19:55:56.407 INFO GenotypeGVCFs - Executing as xxxx@xxxxxxx.uk on Linux v3.10.0-1160.11.1.el7.x86_64 amd64
    19:55:56.407 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_152-release-1056-b12
    19:55:56.407 INFO GenotypeGVCFs - Start Date/Time: 27 April 2021 19:55:56 BST
    19:55:56.407 INFO GenotypeGVCFs - ------------------------------------------------------------
    19:55:56.407 INFO GenotypeGVCFs - ------------------------------------------------------------
    19:55:56.407 INFO GenotypeGVCFs - HTSJDK Version: 2.24.0
    19:55:56.407 INFO GenotypeGVCFs - Picard Version: 2.25.0
    19:55:56.407 INFO GenotypeGVCFs - Built for Spark Version: 2.4.5
    19:55:56.407 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    19:55:56.407 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    19:55:56.407 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    19:55:56.407 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    19:55:56.407 INFO GenotypeGVCFs - Deflater: IntelDeflater
    19:55:56.408 INFO GenotypeGVCFs - Inflater: IntelInflater
    19:55:56.408 INFO GenotypeGVCFs - GCS max retries/reopens: 20
    19:55:56.408 INFO GenotypeGVCFs - Requester pays: disabled
    19:55:56.408 INFO GenotypeGVCFs - Initializing engine
    19:55:56.673 INFO FeatureManager - Using codec VCFCodec to read file file:///data/home/xxxx/myData/PeterData/all_14samples_r5.44/gatk_parent_offspring.g.vcf.gz
    19:55:56.709 INFO IntervalArgumentCollection - Processing 200001 bp from intervals
    19:55:56.711 INFO GenotypeGVCFs - Done initializing engine
    19:55:56.734 INFO ProgressMeter - Starting traversal
    19:55:56.734 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
    19:55:56.942 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2077013 and possibly subsequent; alleleLikelihoodMap is null
    19:55:56.942 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2077013 and possibly subsequent; alleleLikelihoodMap is null
    19:55:56.942 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2077013 and possibly subsequent; alleleLikelihoodMap is null
    19:55:56.942 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2077013 and possibly subsequent; alleleLikelihoodMap is null
    19:55:56.942 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2077013 and possibly subsequent; alleleLikelihoodMap is null
    19:55:56.942 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2077013 and possibly subsequent; alleleLikelihoodMap is null
    19:55:56.942 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2077013 and possibly subsequent; alleleLikelihoodMap is null
    19:55:56.942 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2077013 and possibly subsequent; alleleLikelihoodMap is null
    19:55:56.942 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2077013 and possibly subsequent; alleleLikelihoodMap is null
    19:55:56.942 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2077013 and possibly subsequent; alleleLikelihoodMap is null

    ...

    19:56:00.689 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2272910 and possibly subsequent; alleleLikelihoodMap is null
    19:56:00.689 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2272910 and possibly subsequent; alleleLikelihoodMap is null
    19:56:00.689 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2272910 and possibly subsequent; alleleLikelihoodMap is null
    19:56:00.689 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2272910 and possibly subsequent; alleleLikelihoodMap is null
    19:56:00.689 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2272910 and possibly subsequent; alleleLikelihoodMap is null
    19:56:00.689 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2272910 and possibly subsequent; alleleLikelihoodMap is null
    19:56:00.689 WARN StrandBiasBySample - Annotation will not be calculated at position 2R:2272910 and possibly subsequent; alleleLikelihoodMap is null
    19:56:00.702 INFO ProgressMeter - 2R:2273179 0.1 120273 1819102.6
    19:56:00.702 INFO ProgressMeter - Traversal complete. Processed 120273 total variants in 0.1 minutes.
    19:56:00.706 INFO GenotypeGVCFs - Shutting down engine
    [27 April 2021 19:56:00 BST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.07 minutes.
    Runtime.totalMemory()=3176136704

     

    Thanks,

    Yiguan

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Yiguan Wang Thank you! I will look into this further.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk