Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Empty intersection error when applying intervals

Answered
0

3 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Ran Wei,

    In your command, you have two interval options specified: -L chrY:2781480-26673214 and -L hg19_to_hg38_pure_info.intervals. It looks like there are two intervals in the .intervals file: chrY:12735794-12735923, and chrY:19732457-19732586. So, in total, you have three interval ranges. You also have the interval set rule (-isr) set to INTERSECTION, which will "Take the intersection of intervals (the subset that overlaps all intervals specified)" - from the HaplotypeCaller documentation. These three intervals have no overlapping regions so that is why you are getting this error, you have no intervals for the tool to run on. 

    You can get around this error by using the default interval set rule UNION, which uses a combined list from the intervals specified. Or, you can specify intervals that do intersect.

    Hope this helps with your analysis, and let me know if you have further questions.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Ran Wei

    Thank you Genevieve. However, when I change -isr to UNION, it seems that the GATK will map the variants to the whole genome, which does not fit my goal of setting the intervals - save running time. Another observation is, when I set -ip to 100 instead of 0 in the GATK command I attached above, the running becomes smooth without throwing out errors. Not sure what caused this problem. 

     

    Another problem is, when I change -ip to 100 thinking at least I could use the intervals to screen out those variants that are out of them in the data analysis procedure, there is one variant chr9:133278859_A/ACGCAG, i.e. an A to ACGCAG mutation at chromosome 9, position 133278859. However, the reads view of browser at this position reveals it is actually a T->C SNP at position 133278860. One possible explanation is that I assign interval chr9 133278658 133278859, and since there is -ip set to 100, GATK regards the A at 133278859 as the end of reference genome but continue to add the mutated part to it, making it an "insertion variant" of CGCAG. Is my estimation correct, and are there any ways to avoid such problems?

    The command corresponding to this interval is:

    • Using GATK jar /gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar
       
    • Running:
       
    • java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx1951m -jar /gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar HaplotypeCaller -R /home/dnanexus/in/genome_fastagz/hs38DH.fa --dbsnp /home/dnanexus/in/dbsnp_vcfgz/Homo_sapiens_assembly38.dbsnp138.vcf.gz -I /home/dnanexus/in/mappings_sorted_bam/OtC8152_TAACTCTGATGC_L001_L002_R1_001.bam -ERC GVCF -O results/chr9:60518559-138334717.vcf.gz -isr INTERSECTION -L chr9:60518559-138334717 --native-pair-hmm-threads 1 -L hg19_to_hg38.intervals -ip 100 --max-alternate-alleles 3 -contamination 0 --QUIET
       
    • Using GATK jar /gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar

    The problem is a little complicated. If certain details seem unclear, please let me know. Thanks!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Ran Wei,

    It looks like you have two different issues here, so I'll go over them separately:

    1. The UNION option should only use reads covering the intervals given. If this is not the case and you are seeing reads from other parts of the genome, please let us know along with more details because this might be a bug. I suggested the UNION option because of the error message you were getting: "Argument -L, --interval-set-rule has a bad value: [chrY:2781480-26673214, hg19_to_hg38_pure_info_chr9-2.intervals],INTERSECTION. The specified intervals had an empty intersection." You cannot run intervals that do not overlap with INTERSECTION, because you will then have no data for HaplotypeCaller.
    2. I am not quite sure what is going on with the variant you shared. Please see this troubleshooting article to look over what is happening at that position to call the variant: https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant

    Let me know if you have further questions with either of these points.

    Best,

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk