Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

7 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Priyadarshini Thirunavukkarasu,

    That seems like a long time. It depends on how many variants you are running it on and if you are running it with the 1D or 2D model. Could you provide some more information about your use case?

    One other thing to check - is it still running or is the process hung on your machine?

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Priyadarshini Thirunavukkarasu

    Hello

    I am using 2D model. I have given time limit as 6 days, so the process terminated after 6 days

    Total SNPs: 412511

    Total Indels: 11413

    Priya

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    I see, yes, the process should not take so long. 

    Could you provide the command you used and the program log until it was terminated?

    0
    Comment actions Permalink
  • Avatar
    Priyadarshini Thirunavukkarasu

    This is the command used:

    gatk CNNScoreVariants \
    -I /variants/1.bamout.bam \
    -V /variants/1.vcf.gz \
    -R /data/reference/gch38.fa \
    -O /variants/filtered/1_scored.vcf \
    --tensor-type read_tensor \
    --transfer-batch-size 8 \
    --inference-batch-size 8
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Priyadarshini Thirunavukkarasu could you also provide the program log? This is the output in the terminal with updates on the command.

    0
    Comment actions Permalink
  • Avatar
    Priyadarshini Thirunavukkarasu
    09:05:18.765 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/scicore/soft/apps/GATK/4.0.8.1-foss-2018b-Python-3.6.6/gatk-package-4.0.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
    09:05:18.834 INFO CNNScoreVariants - ------------------------------------------------------------
    09:05:18.835 INFO CNNScoreVariants - The Genome Analysis Toolkit (GATK) v4.0.8.1
    09:05:18.835 INFO CNNScoreVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
    09:05:18.835 INFO CNNScoreVariants - Executing as thirun0000@shi122.cluster.bc2.ch on Linux v3.10.0-1160.el7.x86_64 amd64
    09:05:18.835 INFO CNNScoreVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_212-b03
    09:05:18.835 INFO CNNScoreVariants - Start Date/Time: October 26, 2021 9:05:18 AM CEST
    09:05:18.835 INFO CNNScoreVariants - ------------------------------------------------------------
    09:05:18.835 INFO CNNScoreVariants - ------------------------------------------------------------
    09:05:18.836 INFO CNNScoreVariants - HTSJDK Version: 2.16.0
    09:05:18.836 INFO CNNScoreVariants - Picard Version: 2.18.7
    09:05:18.836 INFO CNNScoreVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    09:05:18.836 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    09:05:18.836 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    09:05:18.836 INFO CNNScoreVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    09:05:18.836 INFO CNNScoreVariants - Deflater: IntelDeflater
    09:05:18.836 INFO CNNScoreVariants - Inflater: IntelInflater
    09:05:18.836 INFO CNNScoreVariants - GCS max retries/reopens: 20
    09:05:18.836 INFO CNNScoreVariants - Using google-cloud-java fork https://github.com/broadinstitute/google-cloud-java/releases/tag/0.20.5-alpha-GCS-RETRY-FIX
    09:05:18.836 WARN CNNScoreVariants -

    [1m[31m !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    Warning: CNNScoreVariants is an EXPERIMENTAL tool and should not be used for production

    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!![0m


    09:05:18.836 INFO CNNScoreVariants - Initializing engine
    09:05:19.278 INFO FeatureManager - Using codec VCFCodec to read file file:///scicore/home/cichon/GROUP/memory_optimization/variants/1.vcf.gz
    09:05:19.445 INFO CNNScoreVariants - Done initializing engine
    slurmstepd: error: *** JOB 64437 ON shi122 CANCELLED AT 2021-10-26T15:05:34 DUE TO TIME LIMIT ***
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for sharing! I think you may have an issue with your python environment. We've seen this before on the forum where the job stops running before it really even stops.

    See this forum post and solution for more information:

    https://gatk.broadinstitute.org/hc/en-us/community/posts/4405273097627-CNNScoreVariants-not-working

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk