How to create a vcf file with variants of pooled lines?
Hello,
How to create a vcf file that contains variants of two groups of lines (instead of variants per line)? A group is a pool of 5 lines. Lines 1-5 belong to the group 1; while line 6-10 belong to the group 2. I have sequencing data of ten lines. I called variants per line and got a VCF file per line. The next step would be to use these line-level VCF files and pool lines in two groups. What would the code look like?
I would really appreciate if someone could help me with that.
-
Hi E Ra
You seem to be looking for a very specific way of using your VCF files which by default is neither supported nor provided by our tools. If you wish to have joint genotyping with your samples we recommend using GenotypeGVCFs tool which genotypes combined GVCF files provided by HaplotypeCaller and CombineGVCFs and/or GenomicsDBImport.
I hope this helps.
-
Hi Gökalp
After some research, I think that the fastq files of the lines of the same group need to be concatenated before the variant calling step.
Thank you!
-
Ah I see. You are looking for pooling samples within the same bam file but call them in a VCF file that has separate samplename entries per pool.
It is quite easy. It is possible to assign different readgroups to different pools within the bam file during the mapping stage. Once you are set you will have a bam file with multiple samples(pools) within therefore GATK tools will treat them as separate samples and will generate VCFs that include different pools for each variant site.
I hope this helps.
-
Hi Gökalp,
I would like to try that. How would the code look like?
Thank you
-
Hi
We cannot directly provide a running code for your request however here is how it would be flowwise
1- Map read per pool to reference genome. Assign readgroups with unique IDs and SampleNames. You may use RevertSam and MergeBamAlignment tools to do this or you may directly assign during the mapping stage. Most mappers allow this.
2- Merge all aligned pools into a single bam file using samtools merge or gatk PrintReads tools for this purpose.
3- Run HaplotypeCaller to call variants with the proper ploidy parameter.
Your readgroups should be set similar to the one below in the final bam file.
@RG ID:RG1 SM:Pool1 LB:LibraryName1 PL:PLATFORMID PU:CENTERID
@RG ID:RG2 SM:Pool2 LB:LibraryName2 PL:PLATFORMID PU:CENTERIDOnce this bam file is fed to HaplotypeCaller, haplotypecaller will produce a VCF file with multiple samples indicated with SM fields in the header.
I hope this helps.
Please sign in to leave a comment.
5 comments