I am currently working with a non-model plant RNA-seq dataset for variant discovery. I started to follow the pipeline in the GATK website, but I got stuck at the BQSR step. I have an assembly fasta genome index and the bam files, but I do not have a .vcf file for the “—known_site” argument in the BaseRecalibrator command. Could you point me how I can generate this type of VCF file based on the genome that I have?
Do I need to run HaplotypeCaller in all my samples to get the VCF files? And should I join all the VCF files and use this for BQSR or use each VCF file separately for each sample?
Please sign in to leave a comment.