GATK Command for variant discovery
Is the script correct for GTK version 4.5 for SNP discovery?
#!/bin/bash
#@(#)GATK_4.50-slurm_bioconda3_rns.sh 2025-01-06 R.N.Sarma
# Define paths
REFERENCE_GENOME="/home/rnsarma/Desktop/Tea_Resequencing_199_genotypes/TV1_genome/TV1_gdrive/TV1.fa"
INPUT_DIR="/home/rnsarma/Desktop/Tea_Resequencing_199_genotypes/TV1_genome/mapping_ajt"
OUTPUT_DIR="/home/rnsarma/Desktop/Tea_Resequencing_199_genotypes/TV1_genome/GATK_4.50-slurm_bioconda3_rns"
TMP_DIR="/var/tmp"
# Step 1: Setup and Environment Preparation
echo "Preparing environment and checking requirements..."
mkdir -p "$OUTPUT_DIR" || { echo "Error: Failed to create output directory."; exit 1; }
# Activate Conda environment for GATK
source /usr/local/bin/bioconda3.rc
conda activate gbs || { echo "Error: Failed to activate Conda environment."; exit 1; }
# Check for required tools
command -v gatk >/dev/null 2>&1 || { echo "Error: GATK is not installed or not in PATH."; exit 1; }
command -v samtools >/dev/null 2>&1 || { echo "Error: Samtools is not installed or not in PATH."; exit 1; }
# Step 2: Check and Prepare the Reference Genome
echo "Checking reference genome files..."
if [[ ! -f "${REFERENCE_GENOME}.fai" ]]; then
echo "Indexing reference genome with samtools..."
samtools faidx "$REFERENCE_GENOME" || { echo "Error: Failed to index reference genome."; exit 1; }
fi
if [[ ! -f "${REFERENCE_GENOME%.fa}.dict" ]]; then
echo "Creating reference genome dictionary with GATK..."
gatk CreateSequenceDictionary -R "$REFERENCE_GENOME" -O "${REFERENCE_GENOME%.fa}.dict" || { echo "Error: Failed to create reference dictionary."; exit 1; }
fi
# Step 3: Generate GVCFs for Each Sample
echo "Generating GVCFs for each BAM file in '$INPUT_DIR'..."
for BAM_FILE in "$INPUT_DIR"/*.sorted.bam; do
if [[ ! -f "$BAM_FILE" ]]; then
echo "Error: No sorted BAM files found in directory '$INPUT_DIR'."
exit 1
fi
SAMPLE_NAME=$(basename "$BAM_FILE" .sorted.bam)
GVCF_FILE="$OUTPUT_DIR/${SAMPLE_NAME}.g.vcf.gz"
echo "Processing sample: $SAMPLE_NAME"
gatk HaplotypeCaller \
-R "$REFERENCE_GENOME" \
-I "$BAM_FILE" \
-O "$GVCF_FILE" \
-ERC GVCF \
-G StandardAnnotation \
--tmp-dir "$TMP_DIR" || { echo "Error: HaplotypeCaller failed for $SAMPLE_NAME."; exit 1; }
done
# Step 4: Combine GVCFs into a Single GVCF
echo "Combining GVCFs into a single file..."
GVCF_LIST="$OUTPUT_DIR/gvcf_list.txt"
ls "$OUTPUT_DIR"/*.g.vcf.gz > "$GVCF_LIST" || { echo "Error: Failed to list GVCF files."; exit 1; }
COMBINED_GVCF="$OUTPUT_DIR/combined.g.vcf.gz"
gatk CombineGVCFs \
-R "$REFERENCE_GENOME" \
--variant "$GVCF_LIST" \
-O "$COMBINED_GVCF" || { echo "Error: Failed to combine GVCFs."; exit 1; }
# Step 5: Genotype the Combined GVCF
echo "Genotyping the combined GVCF..."
FINAL_VCF="$OUTPUT_DIR/final_combined.vcf.gz"
gatk GenotypeGVCFs \
-R "$REFERENCE_GENOME" \
-V "$COMBINED_GVCF" \
-O "$FINAL_VCF" || { echo "Error: GenotypeGVCFs failed."; exit 1; }
echo "Pipeline completed successfully. Final combined VCF: $FINAL_VCF"
-
Hi RN Sarma
Unfortunately we cannot debug scripts for users. We can only help with issues related to GATK here.
Regards.
Please sign in to leave a comment.
1 comment