Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Intervals and interval lists Follow

15 comments

  • Avatar
    WVNicholson

    I can't find anything convincing on how to create a valid Picard interval file although the above information suggests a recipe involving creating the header with "samtools -H" and then adding the required intervals by hand or otherwise.  That may be a dirty hack that could problems in the long run though.  One of the online discussion forums has a thread about this issue and points to a Broad Institute GATK page that no longer exists ("Preparing the essential GATK input files"),

     

    William

    2
    Comment actions Permalink
  • Avatar
    registered_user

    Took me a while to figure this out, but the GATK list format is actually:

    <chr>:<start>-<stop>
    6
    Comment actions Permalink
  • Avatar
    Enrico Cocchi

    How do we download these blacklists that you state you made available?

    7
    Comment actions Permalink
  • Avatar
    Patrícia H. Brito

    Hi,

    How can I access these WDS interval lists?

    "We make our WGS interval lists available, and the good news is that, as long as you're using the same genome reference build as us, you can use them with your own data even if it comes from somewhere else -- assuming you agree with our decisions about which regions to blacklist!"

    2
    Comment actions Permalink
  • Avatar
    Aldhair Médico

    Dear GATK developers,
    I'm trying to run Mutect2 for WES cancer data. 
    However, since the Resource bundle only supports h19 seems I cannot proceed.

    I've been looking for some hg38 interval_list file and I found: ''hg38_v0_HybSelOligos_whole_exome_illumina_coding_v1_whole_exome_illumina_coding_v1.Homo_sapiens_assembly38.targets.interval_list''

    However, when I run the GenomicsDBImport I get the error (no matter if I use my own hg38 reference and .dict or the ones from your Resource Bundle):
    ''A USER ERROR has occurred: Badly formed genome unclippedLoc: Contig chr1 given as location, but this contig isn't present in the Fasta sequence dictionary''

    So, my questions are: 
    1. Is there any release date for this hg38 based exome interval file? will it be soon?
    2. Or the file I put is ok and the error is coming from somewhere else?

    0
    Comment actions Permalink
  • Avatar
    pollyshawn

    How to set Chr01 and Chr02?

    0
    Comment actions Permalink
  • Avatar
    Hee-Bum Yang

    This article was very helpful when I perform 'GenomicsDBimport'.

    However, the working GATK format is actually "<chr>:<start>-<stop>", not "<chr> <start> <stop>" as 'registered_user' said when I run 'GenomicsDBimport'.

    Why don't you update this issue on this article?

    It it not easy to find a solution for the beginner.

     

    0
    Comment actions Permalink
  • Avatar
    Carolina Paez

    This article is pretty informative.

    If I want to do an interval of chromosomes, should I use:

     -L <chr1>-<chr5>

    Any guidance will be appreciated.

     

    3
    Comment actions Permalink
  • Avatar
    Neev Liberman

    How do i include sex chromosomes? Also, can you do an interval as stated above like:

    -L <chr1>-<chr23>
    1
    Comment actions Permalink
  • Neev Liberman and Carolina Paez: I believe that you can not use this syntax. You can either use multiple -L arguments:

    -L chr1 -L chr2 -L chr3 -L chr4 -L chr5

    or use an interval list/bed file with the chromosomes you are after:

    0
    Comment actions Permalink
  • Avatar
    Carolina Paez

    Great to know! Thank you, Dror Kessler (‫דרור קסלר‬‎) for your help.

     

     

    0
    Comment actions Permalink
  • Avatar
    J. Legebeke

    So, what do I put down for the -L argument if I just want to look across the whole genome and not just a specific region?

    1
    Comment actions Permalink
  • Avatar
    Felipe Batalini

    Great explanation, thank you! Since many of the library prep kits are well established and somewhat standard, does Broad have a repository of the most commonly used interval lists? I see that whole_exome_illumina_coding_v1 is used in some workflows (i.e. Exome-Analysis-Pipeline - featured workspace), but what if my sequencing was done using a v6 kit? Where can I find that information? Shouldn't there be a repository? Thank you so much!

    0
    Comment actions Permalink
  • Avatar
    rq m

    I want to confirm the format of bed file.  The article says <chr>:<start>-<stop>, but as far as I know, it seems to be:

    chr1    1049    1500    exon00002       .       -       USA     exon    0       ID=exon00002;Ontology_term="GO:0046703";Ontology_term="GO:0046704"
    chr1    1299    1300    exon00001       .       +       Canada  exon    .       ID=exon00001;score=1;zeroLengthInsertion=True
    chr1    2999    3902    exon00003       .       ?       Canada  exon    2       ID=exon00003;score=4;Name=foo
    chr1    4999    5500    exon00004       .       .       .       exon    .       ID=exon00004;Gap=M8 D3 M6 I1 M6
    chr1    6999    9000    exon00005       10      +       .       exon    1       ID=exon00005;Dbxref="NCBI_gi:10727410"
    0
    Comment actions Permalink
  • Avatar
    Stuart Aidan Quinn

    Derek Caetano-Anolles, thanks for this helpful article! I have the same questions as Enrico Cocchi and Aldhair Médico - which list are you referring to for WGS blacklist (Mutect2 in hg38 cancer dataset). Here, gatk-best-practices/somatic-hg38 I found:

    1. CNV_and_centromere_blacklist.hg38liftover.list
    2. CNV.hg38liftover.bypos.v1.CR1_event_added.mod.seg
    3. final_centromere_hg38.seg

    I believe #1 is the most comprehensive based on the title and the non header line counts. Please correct me if I'm wrong, and thanks again!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk