Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Why Haplotypecaller gives me empty vcf (only the header) for all of my samples?

0

18 comments

  • Avatar
    Gökalp Çelik

    Hi Faezeh Darbaniyan

    Can you provide us more details about your workflow? What is your aligner for these bam files? 

    Do you get any error messages within the HaplotypeCaller run? If so can you post that here?

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Faezeh Darbaniyan

    Thanks for your reply!

    I am using bwa/0.7.16 for aligning:

    bwa mem -t 4 -R "@RG\tID:${1}\tPL:ILLUMINA\tSM:sample1" \

            ${gencode_path2}/hg38.fa \

            ${raw_data}/sample1_R1.fastq.gz \

            ${raw_data}/sample1_R2.fastq.gz > \

            ${aligned_reads}/sample1.paired.sam

    Then use gatk4/4.4.0.0 to mark duplicates and sort:

    gatk MarkDuplicatesSpark -I ${aligned_reads}/sample1.paired.sam \

             -O ${aligned_reads}/sample1_sorted_dedup_reads.bam

    Next, do base quality recalibration:

    gatk BaseRecalibrator \

             -I ${aligned_reads}/sample1_sorted_dedup_reads.bam -R \

             ${gencode_path}/hg38.fa \

             --known-sites ${gencode_path}/Homo_sapiens_assembly38.dbsnp138.vcf \

             -O ${data_output}/recal_data.table

    Then,

    gatk ApplyBQSR \

             -I ${aligned_reads}/sample1_sorted_dedup_reads.bam \

             -R ${gencode_path}/hg38.fa \

             --bqsr-recal-file {$data_output}/recal_data.table \

             -O ${aligned_reads}/sample1_sorted_dedup_bqsr_reads.bam

    Next collect the alignments and here is when it gives zeros:

    gatk CollectAlignmentSummaryMetrics \

             R=${gencode_path}/hg38.fa \

             I=${aligned_reads}/sample1_sorted_dedup_bqsr_reads.bam \

             O=${aligned_reads}/sample1_alignment_metrics.txt

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Can you check whether your logs contain any kind of error messages and early terminations? If so can you post them here?

    0
    Comment actions Permalink
  • Avatar
    Faezeh Darbaniyan

    Sorry, I edited my previous post. No error happens and the zeros appear when I collect the alignments:

    ## METRICS CLASS        picard.analysis.AlignmentSummaryMetrics

    CATEGORY        TOTAL_READS     PF_READS        PCT_PF_READS    PF_NOISE_READS  PF_READS_ALIGNED        PCT_PF_READS_ALIGNED    PF_ALIGNED_BASES        PF_HQ_ALIGNED_READS     PF_HQ_ALIGNED_BASES     PF_HQ_ALIGNED_Q20_BASES PF_HQ_MEDIAN_MISMATCHES PF_MISMATCH_RATE        PF_HQ_ERROR_RATE        PF_INDEL_RATE   MEAN_READ_LENGTH

            SD_READ_LENGTH  MEDIAN_READ_LENGTH      MAD_READ_LENGTH MIN_READ_LENGTH MAX_READ_LENGTH MEAN_ALIGNED_READ_LENGTH        READS_ALIGNED_IN_PAIRS  PCT_READS_ALIGNED_IN_PAIRS      PF_READS_IMPROPER_PAIRS PCT_PF_READS_IMPROPER_PAIRS     BAD_CYCLES      STRAND_BALANCE  PCT_CHIMERAS    PCT_ADAPTER     PCT_SOFTCLIP    PCT_HARDCLIP    AVG_POS_3PRIME_SOFTCLIP_LENGTH  SAMPLE  LIBRARY READ_GROUP

    UNPAIRED        0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0

           0       0       0       0       0       0       0       0       0       0       0       0       0  

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    What kind of experiment is this one? Can you check if your final sorted bam file contains any reads inside?

    0
    Comment actions Permalink
  • Avatar
    Faezeh Darbaniyan

    It is WES with Illumina platform. Yes, sample1_sorted_dedup_bqsr_reads.bam contains reads. Is there any details I need to ask people who provided the fastq files?

    0
    Comment actions Permalink
  • Avatar
    Faezeh Darbaniyan

    Here is the top few lines of sample1_sorted_dedup_bqsr_reads.bam:

    LH00557:25:22KNK7LT3:5:1143:42931:4413  65      chr1    10000   0       3S97M   chr2    32672328        0       ATAATAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC    III-IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII99II-III99IIIIIIII9IIIIIIIIIII-II9I    MC:Z:4S96M      MD:Z:97 RG:Z:15650      NM:i:0  AS:i:97 XS:i:96

    LH00557:25:22KNK7LT3:5:1254:10616:1897  97      chr1    10000   0       4S96M   =       211929503       211919599       CCTCATAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC    IIII-IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII-IIIII-IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    MC:Z:96M4S      MD:Z:96 RG:Z:15650      NM:i:0  AS:i:96 XS:i:95

    LH00557:25:22KNK7LT3:5:1163:46158:1929  177     chr1    10000   0       42S53M5S        chr10   128147464       0       TGGAAAACCCTAGGGAATATAAAAATGTAGCGTTTATTCCATATAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACAGAAG    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII-III    SA:Z:chr1,85987646,-,43M57S,0,0;        MC:Z:95M5S      MD:Z:53 RG:Z:15650      NM:i:0  AS:i:53 XS:i:53

    LH00557:25:22KNK7LT3:5:1129:30245:17551 1105    chr1    10013   0       97M3S   chr11   79987814        0       TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCACAG    IIIIIIIIIIIIIIIIIIIIIII-IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    MC:Z:100M       MD:Z:97 RG:Z:15650      NM:i:0  AS:i:97 XS:i:96

    LH00557:25:22KNK7LT3:5:1129:30264:17551 1105    chr1    10013   0       97M3S   chr11   79987818        0       TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCACAG    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII-II-    MC:Z:4S96M      MD:Z:97 RG:Z:15650      NM:i:0  AS:i:97 XS:i:96

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Faezeh Darbaniyan

    What bothers me here is the all zeroes indicating either there are no reads present or all reads have MAPQ 0 which results in filtering all of those reads by the tool. 

    Can you try running the below command and see how many reads are present?

    samtools view -q 60 sample1_sorted_dedup_bqsr_reads.bam | wc -l

     

    0
    Comment actions Permalink
  • Avatar
    Faezeh Darbaniyan

    Thank you so much for your guide!

    It gives me 0!!!

    Does it mean that I am miss-aligning? I contacted people providing the data and they said either GRCh37 or GRCh38 references can be used for aligning. Is it different from what I have used (hg38)? I have also tried it with Homo_sapiens_assembly38.fasta but still the same.

    0
    Comment actions Permalink
  • Avatar
    Faezeh Darbaniyan

    I just checked 

    samtools view -q 60 sample1_sorted_dedup_reads.bam | wc -l

    And it gives me: 46453705. Does it mean that ApplyBQSR is not performing correctly or does it mean that something is wrong with my data? If the later is true, may I ask what it is so I can tell the people who provided the data?

    Thank you so much for all your helps!

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Can you post the log for BaseRecalibrator and ApplyBQSR steps here so that we can check?

    One thing that I noticed is that your base call qualities are too low. 6 and 16 are the values that I observed from the first couple of reads you posted here. Can you also check your bam file with fastqc or any other read checking tool to see if basecall qualities are any good?

    I am also concerned that if this data was further compressed in lossy format then it is possible that basecall values are lost in translation but this is just a guess. 

     

    0
    Comment actions Permalink
  • Avatar
    Faezeh Darbaniyan

    Hi, 

    Do you mean this log (I had provided it in previous comments)?

    gatk BaseRecalibrator \

             -I ${aligned_reads}/sample1_sorted_dedup_reads.bam -R \

             ${gencode_path}/hg38.fa \

             --known-sites ${gencode_path}/Homo_sapiens_assembly38.dbsnp138.vcf \

             -O ${data_output}/recal_data.table

    Then,

    gatk ApplyBQSR \

             -I ${aligned_reads}/sample1_sorted_dedup_reads.bam \

             -R ${gencode_path}/hg38.fa \

             --bqsr-recal-file {$data_output}/recal_data.table \

             -O ${aligned_reads}/sample1_sorted_dedup_bqsr_reads.bam

    The fastqc on fastq files shows good quality. Do you mean checking sample1_sorted_dedup_bqsr_reads.bam file on fastqc?

    0
    Comment actions Permalink
  • Avatar
    Faezeh Darbaniyan

    To me it looks like issue is happening at ApplyBQSR step. Am I right?

    Here is the fastqc of raw fastq files (R1 & R2),

    Filename sample1_R2.fastq.gz
    File type Conventional base calls
    Encoding Sanger / Illumina 1.9
    Total Sequences 27327351
    Sequences flagged as poor quality 0
    Sequence length 100
    %GC 46
    Filename sample1_R1.fastq.gz
    File type Conventional base calls
    Encoding Sanger / Illumina 1.9
    Total Sequences 27327351
    Sequences flagged as poor quality 0
    Sequence length 100
    %GC

    46

     

    sample1_sorted_dedup_reads.bam,

    Filename sample1_sorted_dedup_reads.bam
    File type Conventional base calls
    Encoding Sanger / Illumina 1.9
    Total Sequences 54975032
    Sequences flagged as poor quality 0
    Sequence length 30-100
    %GC 46

    and sample1_sorted_dedup_bqsr_reads.bam. 

    Filename sample1_sorted_dedup_bqsr_reads.bam
    File type null
    Encoding Illumina 1.5
    Total Sequences 0
    Sequences flagged as poor quality 0
    Sequence length 0
    %GC

    0

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Faezeh Darbaniyan

    Log is created by our tool during the runtime and pushed to stderr normally unless it is redirected to a log file by the command line. Can you send us that output from ApplyBQSR and BaseRecalibrator?

    0
    Comment actions Permalink
  • Avatar
    Faezeh Darbaniyan

    Thanks for explaining. Here is the top lines of BaseRecalibrator:

    23:23:43.288 INFO  BaseRecalibrator - ------------------------------------------------------------

    23:23:43.290 INFO  BaseRecalibrator - The Genome Analysis Toolkit (GATK) v4.4.0.0

    23:23:43.291 INFO  BaseRecalibrator - For support and documentation go to https://software.broadinstitute.org/gatk/

    23:23:43.291 INFO  BaseRecalibrator - Executing as fdarbaniyan@edragon017 on Linux v4.18.0-425.3.1.el8.x86_64 amd64

    23:23:43.291 INFO  BaseRecalibrator - Java runtime: OpenJDK 64-Bit Server VM v17.0.3-internal+0-adhoc..src

    23:23:43.291 INFO  BaseRecalibrator - Start Date/Time: June 19, 2024 at 11:23:43 PM CDT

    23:23:43.291 INFO  BaseRecalibrator - ------------------------------------------------------------

    23:23:43.291 INFO  BaseRecalibrator - ------------------------------------------------------------

    23:23:43.292 INFO  BaseRecalibrator - HTSJDK Version: 3.0.5

    23:23:43.292 INFO  BaseRecalibrator - Picard Version: 3.0.0

    23:23:43.292 INFO  BaseRecalibrator - Built for Spark Version: 3.3.1

    23:23:43.292 INFO  BaseRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2

    23:23:43.293 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

    23:23:43.294 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

    23:23:43.294 INFO  BaseRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

    23:23:43.294 INFO  BaseRecalibrator - Deflater: IntelDeflater

    23:23:43.294 INFO  BaseRecalibrator - Inflater: IntelInflater

    23:23:43.294 INFO  BaseRecalibrator - GCS max retries/reopens: 20

    23:23:43.294 INFO  BaseRecalibrator - Requester pays: disabled

    23:23:43.294 INFO  BaseRecalibrator - Initializing engine

    23:23:43.515 INFO  FeatureManager - Using codec VCFCodec to read file file:///rsrch6/scratch/hema_bio-Malignan/fdarbaniyan/gencode/hg38/Homo_sapiens_assembly38.dbsnp138.vcf

    23:23:43.665 INFO  BaseRecalibrator - Done initializing engine

    23:23:43.669 INFO  BaseRecalibrationEngine - The covariates being used here: 

    23:23:43.669 INFO  BaseRecalibrationEngine -    ReadGroupCovariate

    23:23:43.669 INFO  BaseRecalibrationEngine -    QualityScoreCovariate

    23:23:43.669 INFO  BaseRecalibrationEngine -    ContextCovariate

    23:23:43.669 INFO  BaseRecalibrationEngine -    CycleCovariate

    23:23:43.676 INFO  ProgressMeter - Starting traversal

    23:23:43.676 INFO  ProgressMeter -        Current Locus  Elapsed Minutes       Reads Processed     Reads/Minute

    23:23:53.693 INFO  ProgressMeter -        chr1:24183589              0.2                471000        2822049.1

    23:24:03.697 INFO  ProgressMeter -        chr1:52237854              0.3                998000        2990859.6

    23:24:13.717 INFO  ProgressMeter -        chr1:93984700              0.5               1429000        2854099.4

    23:24:23.721 INFO  ProgressMeter -       chr1:152215153              0.7               1938000        2903733.3

    23:24:33.735 INFO  ProgressMeter -       chr1:180175429              0.8               2467000        2956969.9

    23:24:43.769 INFO  ProgressMeter -       chr1:217158349              1.0               2922000        2917526.5

    23:24:53.795 INFO  ProgressMeter -        chr10:4318621              1.2               3364000        2878535.1

    23:25:03.813 INFO  ProgressMeter -       chr10:44772698              1.3               3760000        2815179.0

    23:25:13.824 INFO  ProgressMeter -       chr10:86460471              1.5               4184000        2784784.9

    23:25:23.823 INFO  ProgressMeter -      chr10:119915812              1.7               4668000        2796688.9

    23:25:33.825 INFO  ProgressMeter -       chr11:17527238              1.8               5148000        2804227.0

    23:25:43.834 INFO  ProgressMeter -       chr11:59647750              2.0               5581000        2786830.7

    23:25:53.851 INFO  ProgressMeter -       chr11:85976562              2.2               6115000        2818513.5

    0
    Comment actions Permalink
  • Avatar
    Faezeh Darbaniyan

    And here is for ApplyBQSR:

    23:36:27.258 INFO  ApplyBQSR - ------------------------------------------------------------

    23:36:27.260 INFO  ApplyBQSR - The Genome Analysis Toolkit (GATK) v4.4.0.0

    23:36:27.260 INFO  ApplyBQSR - For support and documentation go to https://software.broadinstitute.org/gatk/

    23:36:27.260 INFO  ApplyBQSR - Executing as fdarbaniyan@edragon017 on Linux v4.18.0-425.3.1.el8.x86_64 amd64

    23:36:27.260 INFO  ApplyBQSR - Java runtime: OpenJDK 64-Bit Server VM v17.0.3-internal+0-adhoc..src

    23:36:27.260 INFO  ApplyBQSR - Start Date/Time: June 19, 2024 at 11:36:27 PM CDT

    23:36:27.260 INFO  ApplyBQSR - ------------------------------------------------------------

    23:36:27.261 INFO  ApplyBQSR - ------------------------------------------------------------

    23:36:27.261 INFO  ApplyBQSR - HTSJDK Version: 3.0.5

    23:36:27.261 INFO  ApplyBQSR - Picard Version: 3.0.0

    23:36:27.261 INFO  ApplyBQSR - Built for Spark Version: 3.3.1

    23:36:27.261 INFO  ApplyBQSR - HTSJDK Defaults.COMPRESSION_LEVEL : 2

    23:36:27.262 INFO  ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false

    23:36:27.262 INFO  ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true

    23:36:27.262 INFO  ApplyBQSR - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false

    23:36:27.262 INFO  ApplyBQSR - Deflater: IntelDeflater

    23:36:27.262 INFO  ApplyBQSR - Inflater: IntelInflater

    23:36:27.262 INFO  ApplyBQSR - GCS max retries/reopens: 20

    23:36:27.262 INFO  ApplyBQSR - Requester pays: disabled

    23:36:27.262 INFO  ApplyBQSR - Initializing engine

    23:36:27.411 INFO  ApplyBQSR - Done initializing engine

    23:36:27.428 INFO  ProgressMeter - Starting traversal

    23:36:27.429 INFO  ProgressMeter -        Current Locus  Elapsed Minutes       Reads Processed     Reads/Minute

    23:36:27.502 INFO  ApplyBQSR - Shutting down engine

    [June 19, 2024 at 11:36:27 PM CDT] org.broadinstitute.hellbender.tools.walkers.bqsr.ApplyBQSR done. Elapsed time: 0.00 minutes.

    Runtime.totalMemory()=285212672

    ***********************************************************************

     

    A USER ERROR has occurred: Couldn't read file. Error was: {/rsrch6/scratch/hema_bio-Malignan/fdarbaniyan/Sattva_data/WES/P2007014_05162024/Bioinfirmagician_output/results}/15650_recal_data.table with exception: {/rsrch6/scratch/hema_bio-Malignan/fdarbaniyan/Sattva_data/WES/P2007014_05162024/Bioinfirmagician_output/results}/15650_recal_data.table (No such file or directory)

     

    ***********************************************************************

    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

    Using GATK jar /risapps/rhel8/miniconda3/py39_4.12.0/envs/gatk4-4.4.0.0/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar

    Running:

        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /risapps/rhel8/miniconda3/py39_4.12.0/envs/gatk4-4.4.0.0/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar BaseRecalibrator -I /rsrch6/scratch/hema_bio-Malignan/fdarbaniyan/Sattva_data/WES/P2007014_05162024/Bioinfirmagician_output/aligned_reads/61850_LYG-3_sorted_dedup_reads.bam -R /rsrch6/scratch/hema_bio-Malignan/fdarbaniyan/gencode/hg38/hg38.fa --known-sites /rsrch6/scratch/hema_bio-Malignan/fdarbaniyan/gencode/hg38/Homo_sapiens_assembly38.dbsnp138.vcf -O /rsrch6/scratch/hema_bio-Malignan/fdarbaniyan/Sattva_data/WES/P2007014_05162024/Bioinfirmagician_output/results/61850_LYG-3_recal_data.table

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    The error message indicates that recalibration table file is not present or path is incorrect for ApplyBQSR. Can you check if that file is properly created by BaseRecalibrator tool?

    0
    Comment actions Permalink
  • Avatar
    Faezeh Darbaniyan

    Thank you so much! 

    You are right and I had a typo in my code. Now it works perfectly and I got my results.

    Appreciate!

    Faezeh

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk