Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Error using output of ScatterIntervalsByNs by SplitIntervals

0

4 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Kristian Unger , it seems as though you are supplying an interval "@HD VN:1.6 SO:coordinate" that cannot be used. Please check all your intervals and make sure they are valid. 

    0
    Comment actions Permalink
  • Avatar
    Tang Huatao

    Hi,I'm wondering if this issue got solved at last? I met a similar problem recently too when I was using the Collectreadcouts tool and I can't solve it. I use BedTointervalist to transfer hg38.exon.bed to hg38.exon.interval_list. And if I use hg38.exon.interval_list as the input of the Collectreadcounts, it goes well.However,if I use  the PreprocessIntervals to transfer hg38.exon.interval_list to targers_Preprocess.interval.list first and use this one as the input of the Collectreadcounts,it goes wrong.

    Here are mycodes:

    ## Preprocess Intervals
    $GATK  PreprocessIntervals \
    -L ~/wes_cancer/data/hg38.exon.interval_list \
    --sequence-dictionary ${dict} \
    --reference ${ref}  \
    --bin-length 0 \
    --padding 250 \
    --interval-merging-rule OVERLAPPING_ONLY \
    --output ~/wes_cancer/data/targets.preprocessed.interval.list

    interval=~/wes_cancer/data/targets.preprocessed.interval.list
    GATK=~/wes_cancer/biosoft/gatk-4.1.4.1/gatk
    ref=~/wes_cancer/data/Homo_sapiens_assembly38.fasta

    cat config | while read id
    do
      i=./5.gatk/${id}_bqsr.bam
      echo ${i}
      ## step1 : CollectReadCounts
      time $GATK  --java-options "-Xmx7G -Djava.io.tmpdir=./"  CollectReadCounts \
      -I ${i} \
      -L ${interval} \
      -R ${ref} \
      --format HDF5  \
      --interval-merging-rule OVERLAPPING_ONLY \
      --output ./8.cnv/gatk/counts/${id}.clean_counts.hdf5

    thank you!

     

     

     

    0
    Comment actions Permalink
  • Avatar
    David Roazen

    Hi Tang Huatao,

    I believe that the problem here is your file extension, ".interval.list". There are two kinds of interval list file formats supported by GATK:

    - A simple list of intervals, one per line. This kind of file can have a .intervals or .list extension.
    - A "Picard-style" interval list with a header, starting with a line like "@HD VN:1.6 SO:coordinate". This kind of file has a ".interval_list" extension.

    You have the second kind of interval list file (the kind with a header), but your ".list" extension is causing GATK to treat it as the first kind of file, and so it throws an error when it sees the header. Renaming "targets.preprocessed.interval.list" to "targets.preprocessed.interval_list" should solve the problem.

    Regards,
    David

    0
    Comment actions Permalink
  • Avatar
    Tang Huatao

    Hi David,

    It works,thank you very much!

    Regards

    Tang Huatao

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk