Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

HaplotypeCaller bam file path not recognized

0

7 comments

  • Avatar
    Bhanu Gandham

    Hi Phillip Morin - NOAA Federal

     

    Please post all the values for all the arguments used, to see what GATK is getting as input. It is possible that one or more arguments maybe getting an empty value which is causing this error.

     

    0
    Comment actions Permalink
  • Avatar
    Phillip Morin - NOAA Federal

    I thought I had posted all arguments, but I guess I left out ${LOG}. Here are the submission commands, followed by the full script:

    # SLURM submission command lines:

    SCRIPT=~/scripts/HaplotypeCaller/Bbai_haplotypeCaller_100320.sh
    BAM=~/Ref_genomes/Bbai/z0076728_remap_aligned_sorted_unique_q30.bam
    NAME=Bbai_76728
    sbatch -a 1-3 ${SCRIPT} ${BAM} ${NAME}

    # Script "Bbai_haplotypeCaller_100320.sh

    module load bio/gatk/4.1.5.0

    REFERENCE=~/Ref_genomes/Bbai/z0076728_aligned_sorted_unique_q30_target_consensus_renamed.fasta

    BAM=${1}
    NAME=${2}
    NUM=$(printf %02d ${SGE_TASK_ID})
    CHR=$(head -n ${NUM} ${REFERENCE}.scaffolds.list | tail -n 1)

    LOG=${NAME}_${NUM}_HaplotypeCaller.log
    date > ${LOG}

    gatk --java-options "-Xmx26g -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" \
    HaplotypeCaller \
    -R ${REFERENCE} \
    -ERC BP_RESOLUTION \
    -mbq 20 \
    --output-mode EMIT_ALL_ACTIVE_SITES \
    -L ${CHR} \
    -I ${BAM} \
    -O ${NAME}_${NUM}.g.vcf.gz &>> ${LOG}

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi Phillip Morin - NOAA Federal

     

    Sorry I wasn't more clear. I mean the actual values to the arguments and not the variables. 

    Example: CHR=$(head -n ${NUM} ${REFERENCE}.scaffolds.list | tail -n 1), in this case what are the exact values being assigned to CHR?

    0
    Comment actions Permalink
  • Avatar
    Phillip Morin - NOAA Federal

    OK. Here it is with the values in place of the variables.

    gatk --java-options "-Xmx26g -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" \
    HaplotypeCaller \
    -R /home/pmorin/Ref_genomes/Bbai/z0076728_aligned_sorted_unique_q30_target_consensus_renamed.fasta \
    -ERC BP_RESOLUTION \
    -mbq 20 \
    --output-mode EMIT_ALL_ACTIVE_SITES \
    -L scaffold1 \
    -I /home/pmorin/Ref_genomes/Bbai/z0076728_remap_aligned_sorted_unique_q30.bam \
    -O Bbai_76728_01.g.vcf.gz &>> Bbai_76728_01_HaplotypeCaller.log

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi Phillip Morin - NOAA Federal

     

    Can you please validate your bam file with ValidateSamFile tool. I want to make sure your bam file is not malformed. Here is more information: https://gatk.broadinstitute.org/hc/en-us/articles/360035891231-Errors-in-SAM-or-BAM-files-can-be-diagnosed-with-ValidateSamFile

     

    Also can you please share the header of your bam file using this command

    samtools view -H <bamfile>

     

    0
    Comment actions Permalink
  • Avatar
    Phillip Morin - NOAA Federal

    The ValidateSamFile output doesn't show any errors, and only one warning:

    WARNING 2020-03-18 14:45:42 ValidateSamFile NM validation cannot be performed without the reference. All other validations will still occur.

    [Wed Mar 18 14:50:11 PDT 2020] picard.sam.ValidateSamFile done. Elapsed time: 4.49 minutes.

    Runtime.totalMemory()=1192034304

     

    The bam header has 41852 lines, so here are the first 5 and last 3:

    @HD VN:1.6 SO:coordinate
    @SQ SN:scaffold1|size10692286 LN:10692699
    @SQ SN:scaffold2|size9952079 LN:9951381
    @SQ SN:scaffold3|size9951533 LN:9947381
    @SQ SN:scaffold4|size8158917 LN:8159481

    ...

    @SQ SN:scaffold46938|size1000 LN:886
    @PG ID:bwa PN:bwa VN:0.7.17-r1188 CL:bwa mem -t 20 z0076278_aligned_sorted_unique_q30_target_consensus.fasta /groups/hologenomics/mccarthy/data/Beaked_whales/Berardius_bairdii_GRAY/numapping/unmapped_reads1_round2_fixed1.fq.gz /groups/hologenomics/mccarthy/data/Beaked_whales/Berardius_bairdii_GRAY/numapping/unmapped_reads2_round2_fixed2.fq.gz
    @PG ID:samtools PN:samtools PP:bwa VN:1.10 CL:samtools view -H z0076728_remap_aligned_sorted_unique_q30.bam

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi Phillip Morin - NOAA Federal

     

    I can't seem to find anything obviously wrong. Let me try to recreate this issue on my end. Please share your bam and reference files with me and you can find info on sharing this data here: https://gatk.zendesk.com/hc/en-us/articles/360035889671

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk