Idetify biallelic loci in GATK 4,8
Is my codes correct to identify biallelic loci of diploid genome?
#!/bin/bash
gatk HaplotypeCaller \
-R /home/rnsarma/Desktop/Mungbean/mungbean_genome/Vigna_radiata.Vradiata_ver6.dna.toplevel.fa \
-I C1.sorted.bam -I C2.sorted.bam -I C3.sorted.bam -I C4.sorted.bam -I V100.sorted.bam -I V101.sorted.bam -I V102.sorted.bam -I V103.sorted.bam \
-I V10.sorted.bam -I V11.sorted.bam -I V12.sorted.bam -I V13.sorted.bam -I V14.sorted.bam -I V15.sorted.bam -I V16.sorted.bam -I V17.sorted.bam \
-I V18.sorted.bam -I V19.sorted.bam -I V1.sorted.bam -I V20.sorted.bam -I V21.sorted.bam -I V22.sorted.bam -I V23.sorted.bam -I V24.sorted.bam \
-I V25.sorted.bam -I V26.sorted.bam -I V27.sorted.bam -I V28.sorted.bam -I V29.sorted.bam -I V2.sorted.bam -I V30.sorted.bam -I V31.sorted.bam \
-I V32.sorted.bam -I V33.sorted.bam -I V34.sorted.bam -I V35.sorted.bam -I V36.sorted.bam -I V37.sorted.bam -I V38.sorted.bam -I V39.sorted.bam \
-I V3.sorted.bam -I V40.sorted.bam -I V41.sorted.bam -I V42.sorted.bam -I V43.sorted.bam -I V44.sorted.bam -I V45.sorted.bam -I V46.sorted.bam \
-I V47.sorted.bam -I V48.sorted.bam -I V49.sorted.bam -I V4.sorted.bam -I V50.sorted.bam -I V51.sorted.bam -I V52.sorted.bam -I V53.sorted.bam \
-I V54.sorted.bam -I V55.sorted.bam -I V56.sorted.bam -I V57.sorted.bam -I V58.sorted.bam -I V59.sorted.bam -I V5.sorted.bam -I V60.sorted.bam \
-I V61.sorted.bam -I V62.sorted.bam -I V63.sorted.bam -I V64.sorted.bam -I V65.sorted.bam -I V66.sorted.bam -I V67.sorted.bam -I V68.sorted.bam \
-I V69.sorted.bam -I V6.sorted.bam -I V70.sorted.bam -I V71.sorted.bam -I V72.sorted.bam -I V73.sorted.bam -I V74.sorted.bam -I V75.sorted.bam \
-I V76.sorted.bam -I V77.sorted.bam -I V78.sorted.bam -I V79.sorted.bam -I V7.sorted.bam -I V80.sorted.bam -I V81.sorted.bam -I V82.sorted.bam \
-I V83.sorted.bam -I V84.sorted.bam -I V85.sorted.bam -I V86.sorted.bam -I V87.sorted.bam -I V88.sorted.bam -I V89.sorted.bam -I V8.sorted.bam \
-I V90.sorted.bam -I V91.sorted.bam -I V92.sorted.bam -I V93.sorted.bam -I V94.sorted.bam -I V95.sorted.bam -I V96.sorted.bam -I V97.sorted.bam \
-I V98.sorted.bam -I V99.sorted.bam -I V9.sorted.bam \
-O ../raw_variants_mungbean_non_slurm1.vcf \
-G StandardAnnotation \
--output-mode EMIT_ALL_CONFIDENT_SITES \
--tmp-dir /var/tmp \
--max-alleles 2 \
--min-alleles 2
-
Hi RN Sarma
Our recommendations are still the same. Instead of changing the default behavior of HaplotypeCaller we recommend running HaplotypeCaller by default and later filtering and selecting variants from the raw vcf.
Also the current command is quite open to resource errors due to having too many files open and requiring too much heap size to keep all bam files for tracking variants. We suggest you to call variants from each sample individually using the --ERC GVCF parameter and later combine GVCF files and genotype them together so that you can work within more reasonable compute resources.
Also instead of opening multiple topics for the same question, please try to continue from the same topic as it helps keeping all the ideas together and keeps following up issues better on our end and your end as well.
I am closing all other topics related to this subject from before. Please continue from here.
Regards.
Please sign in to leave a comment.
1 comment