Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

How to transform interval list with 297,694 lines to <30,000 lines to run Haplotype-caller in Terra?

0

4 comments

  • Avatar
    Bhanu Gandham

    Hi sahuno

     

    So you are facing this issue because the workflow scatters for each item in the input of the `interval.list`. You can try one of two things to resolve this:

    1. You can provide the entire "bed file" as an item in an `interval.list` input instead of providing the bed file as the input. OR
    2. The approach 4 should work. Try this
      gatk SplitIntervals \
      -R Homo_sapiens_assembly38.fasta \
      -L Exome-Agilent_V6_UTR.bed \
      --scatter-count 30 \
      -O scattered_calling_intervals
    0
    Comment actions Permalink
  • Avatar
    sahuno

    Thank you Bhanu Gandham for the help

    just for clarification on step 1; you meant in my `cloud_shell`

    `$ echo gs://fc-1f2cec15-4c83-4842-8fde-c79a1131b2dd/Exome-Agilent_V6_UTR.bed" >> Exome-Agilent_V6_UTR_scattered_calling_intervals_list`

    then specify `gs://fc-1f2cec15-4c83-4842-8fde-c79a1131b2dd/Exome-Agilent_V6_UTR_scattered_calling_intervals_list` as my input for `interval_list` in the `gatk haplotype-caller` workflow?

    I'm testing the alternative method (2) suggested

    Thanks once again

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi sahuno

     

    Yes that is what I mean.

    0
    Comment actions Permalink
  • Avatar
    sahuno

    Great! thanks!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk