Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

missing file: Callset.json , when creating PON

Answered
0

5 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Lait,

    The program log you are sharing here does not look complete to me. If this is where it ends, the process likely got killed prematurely for some reason, potentially by your machine. This would explain why you are not getting all of the output files. Try to give the job more memory or more storage space to get it to complete.

    We have a GenomicsDBImport performance guide here: GenomicsDBImport usage and performance guidelines.

    Hope this solves your issue!

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Lait

    Thank you for your reply.

    Yes you are right, the process was aborted,
    I gave more resources, and ran the code on each chromosome separately (as you see in the -L option, I am using the V7 Agilent exam capture kit, so I divided this file, one chromosome per file, and ran the code in parallel 24 times)

    My question is, how can I reassemble the output that is spread in 24 different workspaces, to be able to use it in the next step of (gatk CreateSomaticPanelOfNormals )?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Lait,

    There isn't a good method to combine GenomicsDB workspaces before CreateSomaticPanelofNormals. CreateSomaticPanelofNormals can only accept one GenomicsDB workspace. In our production pipelines, VCFs with different intervals are merged after GenotypeGVCFs, which is not a step you will do when creating your PON.

    A better option would either to be add samples incrementally to your GenomicsDB or decrease the batch size. I would recommend keeping all your intervals in the same command.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Isadora Machado Ghilardi

    I`m also missing the callset.json from my database, I submit this job to the HPC:

    #!/bin/bash -l

    #PBS -N 

    #PBS -l nodes=1:ppn=32,mem=128gb

    #PBS -l walltime=0048:00:00

    conda activate gatk_env

    gatk GenomicsDBImport -V 17 samples.g.vcf.gz --genomicsdb-workspace-path -my_cohort --intervals hg38.bed (used by the Ion torrent). Should i do something different? I am doing the whole exome sequencing. 

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Isadora Machado Ghilardi

    You may need to setup a temporary folder accesible to gatk. You may refer to the document below.

    https://gatk.broadinstitute.org/hc/en-us/articles/18965297287067-How-to-setup-and-use-temporary-folder-for-GATK-local-execution 

    If you still observe issues please also include your logs so that we can properly describe the problem. 

    Regards. 

    0
    Comment actions Permalink

Post is closed for comments.

Powered by Zendesk