Empty intersection error when applying intervals
AnsweredHi, I am a first-time user of GATK. I am using the GATK4 haplotypecaller (parallel per-intervalbyns) version 2.0.1 embedded in the DNAnexus platform for variant calling. Several short intervals are applied to this workflow so that only variants falling in this predetermined ranges will be called. However, this operation has caused a user error like:
-
***********************************************************************
-
-
A USER ERROR has occurred: Argument -L, --interval-set-rule has a bad value: [chrY:2781480-26673214, hg19_to_hg38_pure_info_chr9-2.intervals],INTERSECTION. The specified intervals had an empty intersection
-
-
***********************************************************************
Actually there are multiple error reports like this on serval chromosomes, but I only take chrY as an example here. The intervals I apply on chrY are:
chrY 12735794 12735923
chrY 19732457 19732586, which are written in a regular bed file (hg19_to_hg38_pure_info.bed) as input.
The part of log that is related to this run is:
-
Using GATK jar /gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar
-
Running:
-
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx1951m -jar /gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar HaplotypeCaller -R /home/dnanexus/in/genome_fastagz/hs38DH.fa -I /home/dnanexus/in/mappings_sorted_bam/OtC8152_TAACTCTGATGC_L001_L002_R1_001.bam -ERC GVCF -O results/chrY:2781480-26673214.vcf.gz -isr INTERSECTION -L chrY:2781480-26673214 --native-pair-hmm-threads 1 -L hg19_to_hg38_pure_info.intervals -ip 0 --max-alternate-alleles 3 -contamination 0 --QUIET
-
Using GATK jar /gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar
This error was thrown when I change the interval paddling from the DNAnexus default value 100 to 0 to avoid unwanted variants. Paddling of 100 as default setting receives no errors.
The input files are the BAM file generated by BWA-MEM FASTQ read mapper version 1.5.0, using DNAnexus default genome "GRCh38 With Alt Contigs - Hs38DH". No intervals are applied in this step. For GATK4 haplotype caller, I also use the DNAnexus GRCh38 default genome.
Can anyone let me know what the possible causes of this problem? Thanks.
-
Hi Ran Wei,
In your command, you have two interval options specified: -L chrY:2781480-26673214 and -L hg19_to_hg38_pure_info.intervals. It looks like there are two intervals in the .intervals file: chrY:12735794-12735923, and chrY:19732457-19732586. So, in total, you have three interval ranges. You also have the interval set rule (-isr) set to INTERSECTION, which will "Take the intersection of intervals (the subset that overlaps all intervals specified)" - from the HaplotypeCaller documentation. These three intervals have no overlapping regions so that is why you are getting this error, you have no intervals for the tool to run on.
You can get around this error by using the default interval set rule UNION, which uses a combined list from the intervals specified. Or, you can specify intervals that do intersect.
Hope this helps with your analysis, and let me know if you have further questions.
Best,
Genevieve
-
Thank you Genevieve. However, when I change -isr to UNION, it seems that the GATK will map the variants to the whole genome, which does not fit my goal of setting the intervals - save running time. Another observation is, when I set -ip to 100 instead of 0 in the GATK command I attached above, the running becomes smooth without throwing out errors. Not sure what caused this problem.
Another problem is, when I change -ip to 100 thinking at least I could use the intervals to screen out those variants that are out of them in the data analysis procedure, there is one variant chr9:133278859_A/ACGCAG, i.e. an A to ACGCAG mutation at chromosome 9, position 133278859. However, the reads view of browser at this position reveals it is actually a T->C SNP at position 133278860. One possible explanation is that I assign interval chr9 133278658 133278859, and since there is -ip set to 100, GATK regards the A at 133278859 as the end of reference genome but continue to add the mutated part to it, making it an "insertion variant" of CGCAG. Is my estimation correct, and are there any ways to avoid such problems?
The command corresponding to this interval is:
-
Using GATK jar /gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar
-
Running:
-
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx1951m -jar /gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar HaplotypeCaller -R /home/dnanexus/in/genome_fastagz/hs38DH.fa --dbsnp /home/dnanexus/in/dbsnp_vcfgz/Homo_sapiens_assembly38.dbsnp138.vcf.gz -I /home/dnanexus/in/mappings_sorted_bam/OtC8152_TAACTCTGATGC_L001_L002_R1_001.bam -ERC GVCF -O results/chr9:60518559-138334717.vcf.gz -isr INTERSECTION -L chr9:60518559-138334717 --native-pair-hmm-threads 1 -L hg19_to_hg38.intervals -ip 100 --max-alternate-alleles 3 -contamination 0 --QUIET
-
Using GATK jar /gatk-4.1.8.0/gatk-package-4.1.8.0-local.jar
The problem is a little complicated. If certain details seem unclear, please let me know. Thanks!
-
-
Hi Ran Wei,
It looks like you have two different issues here, so I'll go over them separately:
- The UNION option should only use reads covering the intervals given. If this is not the case and you are seeing reads from other parts of the genome, please let us know along with more details because this might be a bug. I suggested the UNION option because of the error message you were getting: "Argument -L, --interval-set-rule has a bad value: [chrY:2781480-26673214, hg19_to_hg38_pure_info_chr9-2.intervals],INTERSECTION. The specified intervals had an empty intersection." You cannot run intervals that do not overlap with INTERSECTION, because you will then have no data for HaplotypeCaller.
- I am not quite sure what is going on with the variant you shared. Please see this troubleshooting article to look over what is happening at that position to call the variant: https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant
Let me know if you have further questions with either of these points.
Best,
Genevieve
-
Hi, I have also been having issues with the same error:
****************************************************************************************************
Argument -L, --interval-set-rule has a bad value: [/cbio/projects/003/melissa/ALS_genes_panel_SAMRC_WES/MGI_Exome
_Capture_V5_hg38.bed, chrX:10001-2781479],INTERSECTION. The specified intervals had an empty intersection****************************************************************************************************
The part of the log related to the run is:
- Using GATK jar /gatk/gatk-package-4.1.3.0-local.jar
- Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -XX:+UseSerialGC -Xss456k -Xms2g -Xmx44g -jar /gatk/gatk-package-4.1.3.0-local.jar HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I MND_G154.1MBU.md.recal.cram --emit-ref-confidence GVCF --dbsnp dbsnp_146.hg38.vcf.gz --L /cbio/projects/003/melissa/ALS_genes_panel_SAMRC_WES/MGI_Exome_Capture_V5_hg38.bed --L chrX:10001-2781479 --interval-set-rule INTERSECTION --genotyping-mode DISCOVERY -A Coverage -A FisherStrand -A StrandOddsRatio -A MappingQualityRankSumTest -A QualByDepth -A RMSMappingQuality -A ReadPosRankSumTest -stand-call-conf 30 --sample-ploidy 2 -O MND_G154.1MBU.X_PAR1.g.vcf.gz
I have also attempted to make use of a padded .BED file as mentioned above but got the same error:
*****************************************************************************************************
Argument -L, --interval-set-rule has a bad value: [/cbio/projects/003/melissa/ALS_genes_panel_SAMRC_WES/MGI_Exome
_Capture_V5_hg38_padded.bed, chrX],INTERSECTION. The specified intervals had an empty intersection- Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsam
jdk.compression_level=2 -XX:+UseSerialGC -Xss456k -Xms2g -Xmx44g -jar /gatk/gatk-package-4.1.3.0-local.jar HaplotypeCaller -R Homo_sapiens_assembly38.fasta -I MND_G154.1MBU.md.recal.cram --emit-ref-confidence GVCF --dbsnp dbsnp_146.hg38.vcf.gz --L /cbio/projects/003/melissa/ALS_genes_panel_SAMRC_WES/MGI_Exome_Capture_V5_hg38_padded.bed --L chrX -XL chrX:10001-2781479 -XL chrX:155701383-156030895 --interval-set-rule INTERSECTION --genotyping-mode DISCOVERY -A Coverage -A FisherStrand -A StrandOddsRatio -A MappingQualityRankSumTest -A QualByDepth -A RMSMappingQuality -A ReadPosRankSumTest -stand-call-conf 30 --sample-ploidy 1 -O MND_G154.1MBU.X_nonPAR.g.vcf.gz
******************************************************************************************************
I suspect that this is a similar issue to what is above since I assume that Ran Wei also made use of a list over from hg19 to hg38. I would greatly appreciate if any further insight to this issue could be shared. Thank you
-
Can you load your bed file to IGV and check if you have any intersection present with the region that you want to include from chrX?
-
Hi @Gökalp Çelik I can confirm that there are reads present in the regions I need to include
-
I am sure reads are present but how about the bed file? This error is thrown when there is no intersection between any interval parameters therefore it is possible that your bed file is missing those intervals that belong to chrX:10001-2781479. Can you check that?
-
Hi. Thank you for this help I have corrected the intersections in the BED file that appear in errors where the intersection has specified and that seems to have fixed that issue. However, I have also received errors same as above but where the interval is not specified for the chromosome. I.e. instead of the error reading chr2:intersection-intersection, it only reads chr2.
Please sign in to leave a comment.
8 comments