Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

missing file: Callset.json , when creating PON

Answered
0

3 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Lait,

    The program log you are sharing here does not look complete to me. If this is where it ends, the process likely got killed prematurely for some reason, potentially by your machine. This would explain why you are not getting all of the output files. Try to give the job more memory or more storage space to get it to complete.

    We have a GenomicsDBImport performance guide here: GenomicsDBImport usage and performance guidelines.

    Hope this solves your issue!

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Lait

    Thank you for your reply.

    Yes you are right, the process was aborted,
    I gave more resources, and ran the code on each chromosome separately (as you see in the -L option, I am using the V7 Agilent exam capture kit, so I divided this file, one chromosome per file, and ran the code in parallel 24 times)

    My question is, how can I reassemble the output that is spread in 24 different workspaces, to be able to use it in the next step of (gatk CreateSomaticPanelOfNormals )?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Lait,

    There isn't a good method to combine GenomicsDB workspaces before CreateSomaticPanelofNormals. CreateSomaticPanelofNormals can only accept one GenomicsDB workspace. In our production pipelines, VCFs with different intervals are merged after GenotypeGVCFs, which is not a step you will do when creating your PON.

    A better option would either to be add samples incrementally to your GenomicsDB or decrease the batch size. I would recommend keeping all your intervals in the same command.

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk