HaplotypeCaller mem/cpu/threads/node specs
Hi I'm running HaplotypeCaller to call raw variants for each chr in whole gen seq data for 23 individuals (tot gen size ~1.2G, coverage ~10x/ind) but it's certainly taking very long (more than 2 days). I'm using multi-threading -nct 4 and SLURM to submit the array (total of 60 tasks).
My question is on how -nct 4 interacts with the other job sub specs, so that I can maximize processing.
My core/men limits are:
Batch 32/246g
Bigmem 32/768g
Bellow are the job submission specs and an exemple of a GATK cmd as I'm running, but I believe it's not best bc it's taking more than 2 days to compute each task.
#! /bin/bash
#SBATCH --cpus-per-task=4
#SBATCH -p batch
#SBATCH --job-name="firstrun"
#SBATCH --mem=40g
#SBATCH -t 72:00:00
#SBATCH -o /users/mfariasv/data/mfariasv/aligned_newZFV2/rawvar/stdout/haplocaller_ZFV2_%A_%a.stdout
#SBATCH --array=1-64%6
eval $(sed "${SLURM_ARRAY_TASK_ID}q;d" /users/mfariasv/data/mfariasv/haplocaller.sh)
where haplocaller.sh
java -jar /users/mfariasv/data/mfariasv/install/GenomeAnalysisTK-3.8-0-ge9d806836/GenomeAnalysisTK.jar -T HaplotypeCaller -R newzf20/GCF_008822105.2_bTaeGut2.pat.W.v2_genomic.fa -I RSFV1A_match.bam
...
-I RSFV1Z_match.bam -L NC_044998.1 --genotyping_mode DISCOVERY --output_mode EMIT_ALL_SITES -stand_call_conf 30 -mbq 20 -hets 0.006 -nct 4 -o raw_variantsZF_NC_044998.1.vcf
-
Hi madzayasodara, is this the same issue that we are working on at the other thread? https://gatk.broadinstitute.org/hc/en-us/community/posts/360071561332-HaplotypeCaller-too-many-alternative-alleles-found-
-
Not the same but related.
It seems that yes if I am doing -nct 4, and request -c 4 in my SlLURM submission, the program will do one thread per core and performance will scale accordingly.
Thank you
Please sign in to leave a comment.
2 comments