Query regarding Intervals and interval lists for Target Enrichment Sequecing
Answeredhttps://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists
Intervals and interval lists:
As it has been mentioned in the given above link article that (the interval list should correspond to the capture targets used for the library prep,)
Targeted sequencing (exomes, gene panels etc.)
For exomes and similarly targeted data types, the interval list should correspond to the capture targets used for the library prep, and is typically provided by the prep kit manufacturer (with versions for each ref genome build of course).
In my case, the target enrichment sequencing method was performed using Agilent SureSelect Target Enrichment System using the SureSelectXT Custom 3-5.9Mb. (Agilent) .
I had received Region.bed and Covered.bed file. Here I would like to request you to suggest to me which file I should use as the interval list?
[design ID]_Regions.bed - This BED file contains a single track of the target regions of interest that SureDesign used to select the probes. You can use this track to see the exact regions that the program was attempting to cover when selecting the probes.
head -n 3 Region.bed
chr13 48069202 48084157 chr13:48069203-48084157
chr13 48110220 48120755 chr13:48110221-48120755
chr13 48123958 48166976 chr13:48123959-48166976
wc -l Regions.bed
77
[design ID]_Covered.bed - This BED file contains a single track of the genomic regions that are covered by one or more probes in the design. The fourth column of the file contains annotation information. You can use this file for assessing coverage metrics.
head -n 3 Covered.bed
chr13 48069307 48069427 chr13:48069203-48084157
chr13 48069475 48069595 chr13:48069203-48084157
chr13 48070408 48070528 chr13:48069203-48084157
chr13 48070800 48070920 chr13:48069203-48084157
wc -l Covered.bed
4494
I would like to run the HaplotypeCaller program using an interval list, As I had mentioned that I have target enrichment sequencing data.
Command:
--intervals / -L
One or more genomic intervals over which to operate (Is it possible to use bed file (Covered.bed or Region.bed) as an interval (-L)?
gatk --java-options -Xmx50g HaplotypeCaller -R genome.fa -I SetNm.bam -O raw.g.vcf.gz -ERC GVCF --minimum-mapping-quality 20 --min-base-quality-score 20 -L Covered.bed (Region.bed) -ip 200
Should I use one of these bed file as it is or should I create another bed file (chr"\t"start"\t"end) as an interval list (-L)? Should I keep the 1st 2nd 3rd column or should I keep the 4th column?
I would be grateful, kindly help me regarding this query.
Thank you so much in advance.
-
Hi Abrish,
I am going to move your post into our Community Discussions -> General Discussion topic, as the Non-Human topic is for reporting bugs and issues with GATK.
You can read more about our forum guidelines and the topics here: Forum Guidelines.
Best,
Genevieve
-
Hi Abrish,
Yes, bed files are suitable for interval lists. Here is a description of the bed file format: https://genome.ucsc.edu/FAQ/FAQformat.html#format1. As long as your bed files meet those requirements, they can be used as the interval lists.
From what you showed, it looks like the region file has larger intervals and the covered file has more specific intervals. Either file would probably work and shouldn't have any performance issues, but if you really want to limit your analysis, you can use the covered file for more specific analysis.
Let me know if you have any further questions.
Best,
Genevieve
-
Hi Genevieve Brandt (she/her) ,
Thank you so much.
I had used -L Covered.bed -ip 100 parameter for GATK Haplotype Caller program. I would like to know that Should I use -L Region.bed -ip 100 parameter at every step after the GATK Haplotype caller? I mean, Should I use -L Covered.bed -ip 100 for CombineGVCFs, GenotypeGVCFs, SelectVariants and VariantFiltration programs also?
gatk --java-options -Xmx50g HaplotypeCaller -R genome.fa -I SetNm.bam -O raw.g.vcf.gz -ERC GVCF --minimum-mapping-quality 20 --min-base-quality-score 20 -L Covered.bed -ip 100
I would be grateful If you could suggest about it.
Thank you so much in advance.
-
Yes, use that same intervals file (covered.bed) for all the following steps to keep your results consistent.
-
Dear Genevieve Brandt (she/her) ,
Thank you so much.
Please sign in to leave a comment.
5 comments