Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Empty intersection error when applying intervals

Answered
0

8 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Ran Wei,

    In your command, you have two interval options specified: -L chrY:2781480-26673214 and -L hg19_to_hg38_pure_info.intervals. It looks like there are two intervals in the .intervals file: chrY:12735794-12735923, and chrY:19732457-19732586. So, in total, you have three interval ranges. You also have the interval set rule (-isr) set to INTERSECTION, which will "Take the intersection of intervals (the subset that overlaps all intervals specified)" - from the HaplotypeCaller documentation. These three intervals have no overlapping regions so that is why you are getting this error, you have no intervals for the tool to run on. 

    You can get around this error by using the default interval set rule UNION, which uses a combined list from the intervals specified. Or, you can specify intervals that do intersect.

    Hope this helps with your analysis, and let me know if you have further questions.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Ran Wei

    Thank you Genevieve. However, when I change -isr to UNION, it seems that the GATK will map the variants to the whole genome, which does not fit my goal of setting the intervals - save running time. Another observation is, when I set -ip to 100 instead of 0 in the GATK command I attached above, the running becomes smooth without throwing out errors. Not sure what caused this problem. 

     

    Another problem is, when I change -ip to 100 thinking at least I could use the intervals to screen out those variants that are out of them in the data analysis procedure, there is one variant chr9:133278859_A/ACGCAG, i.e. an A to ACGCAG mutation at chromosome 9, position 133278859. However, the reads view of browser at this position reveals it is actually a T->C SNP at position 133278860. One possible explanation is that I assign interval chr9 133278658 133278859, and since there is -ip set to 100, GATK regards the A at 133278859 as the end of reference genome but continue to add the mutated part to it, making it an "insertion variant" of CGCAG. Is my estimation correct, and are there any ways to avoid such problems?

    The command corresponding to this interval is:

    • Using GATK jar /gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar
       
    • Running:
       
    • java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx1951m -jar /gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar HaplotypeCaller -R /home/dnanexus/in/genome_fastagz/hs38DH.fa --dbsnp /home/dnanexus/in/dbsnp_vcfgz/Homo_sapiens_assembly38.dbsnp138.vcf.gz -I /home/dnanexus/in/mappings_sorted_bam/OtC8152_TAACTCTGATGC_L001_L002_R1_001.bam -ERC GVCF -O results/chr9:60518559-138334717.vcf.gz -isr INTERSECTION -L chr9:60518559-138334717 --native-pair-hmm-threads 1 -L hg19_to_hg38.intervals -ip 100 --max-alternate-alleles 3 -contamination 0 --QUIET
       
    • Using GATK jar /gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar

    The problem is a little complicated. If certain details seem unclear, please let me know. Thanks!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Ran Wei,

    It looks like you have two different issues here, so I'll go over them separately:

    1. The UNION option should only use reads covering the intervals given. If this is not the case and you are seeing reads from other parts of the genome, please let us know along with more details because this might be a bug. I suggested the UNION option because of the error message you were getting: "Argument -L, --interval-set-rule has a bad value: [chrY:2781480-26673214, hg19_to_hg38_pure_info_chr9-2.intervals],INTERSECTION. The specified intervals had an empty intersection." You cannot run intervals that do not overlap with INTERSECTION, because you will then have no data for HaplotypeCaller.
    2. I am not quite sure what is going on with the variant you shared. Please see this troubleshooting article to look over what is happening at that position to call the variant: https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant

    Let me know if you have further questions with either of these points.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Erin Kinghorn

    Hi, I have also been having issues with the same error:

    ****************************************************************************************************

    Argument -L, --interval-set-rule has a bad value: [/cbio/projects/003/melissa/ALS_genes_panel_SAMRC_WES/MGI_Exome
    _Capture_V5_hg38.bed, chrX:10001-2781479],INTERSECTION. The specified intervals had an empty intersection

    ****************************************************************************************************

    The part of the log related to the run is:

    • Using GATK jar /gatk/gatk-package-4.1.3.0-local.jar
    • Running:
            java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -XX:+UseSerialGC -Xss456k -Xms2g -Xmx44g -jar /gatk/gatk-package-4.1.3.0-local.jar HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I MND_G154.1MBU.md.recal.cram --emit-ref-confidence GVCF --dbsnp dbsnp_146.hg38.vcf.gz --L /cbio/projects/003/melissa/ALS_genes_panel_SAMRC_WES/MGI_Exome_Capture_V5_hg38.bed --L chrX:10001-2781479 --interval-set-rule INTERSECTION --genotyping-mode DISCOVERY -A Coverage -A FisherStrand -A StrandOddsRatio -A MappingQualityRankSumTest -A QualByDepth -A RMSMappingQuality -A ReadPosRankSumTest -stand-call-conf 30 --sample-ploidy 2 -O MND_G154.1MBU.X_PAR1.g.vcf.gz

    I have also attempted to make use of a padded .BED file as mentioned above but got the same error:

     

    *****************************************************************************************************

    Argument -L, --interval-set-rule has a bad value: [/cbio/projects/003/melissa/ALS_genes_panel_SAMRC_WES/MGI_Exome
    _Capture_V5_hg38_padded.bed, chrX],INTERSECTION. The specified intervals had an empty intersection

    • Running:
            java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsam
      jdk.compression_level=2 -XX:+UseSerialGC -Xss456k -Xms2g -Xmx44g -jar /gatk/gatk-package-4.1.3.0-local.jar HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I MND_G154.1MBU.md.recal.cram --emit-ref-confidence GVCF --dbsnp dbsnp_146.hg38.vcf.gz --L /cbio/projects/003/melissa/ALS_genes_panel_SAMRC_WES/MGI_Exome_Capture_V5_hg38_padded.bed --L chrX -XL chrX:10001-2781479 -XL chrX:155701383-156030895 --interval-set-rule INTERSECTION --genotyping-mode DISCOVERY -A Coverage -A FisherStrand -A StrandOddsRatio -A MappingQualityRankSumTest -A QualByDepth -A RMSMappingQuality -A ReadPosRankSumTest -stand-call-conf 30 --sample-ploidy 1 -O MND_G154.1MBU.X_nonPAR.g.vcf.gz

    ******************************************************************************************************

    I suspect that this is a similar issue to what is above since I assume that Ran Wei also made use of a list over from hg19 to hg38. I would greatly appreciate if any further insight to this issue could be shared. Thank you

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Erin Kinghorn

    Can you load your bed file to IGV and check if you have any intersection present with the region that you want to include from chrX?

    0
    Comment actions Permalink
  • Avatar
    Erin Kinghorn

    Hi @Gökalp Çelik  I can confirm that there are reads present in the regions I need to include

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    I am sure reads are present but how about the bed file? This error is thrown when there is no intersection between any interval parameters therefore it is possible that your bed file is missing those intervals that belong to chrX:10001-2781479. Can you check that?

    0
    Comment actions Permalink
  • Avatar
    Erin Kinghorn

    Hi. Thank you for this help I have corrected the intersections in the BED file that appear in errors where the intersection has specified and that seems to have fixed that issue. However, I have also received errors same as above but where the interval is not specified for the chromosome. I.e. instead of the error reading chr2:intersection-intersection, it only reads chr2.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk