Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

gatk HaplotypeCaller gives me an empty vcf

Answered
0

18 comments

  • Avatar
    Dhara Awasthi

    I am facing the same issue with Arabidopsis RNA seq data. I checked my bam file using ValidateSam command but it did not yield any error or warning. I am not able to understand the problem. I don't think there is any problem with bam file. I am just getting an empty VCF file everytime after using Haplotype Caller. I also used Mutect2 but it's also resulting an empty VCF file with just the headers. 

    Can anyone help me understand the problem? Any suggestions/comments will be of great help.

    2
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Marie Saitou,

    First I would recommend using a more recent version of GATK because there were quite a few issues in 4.0.0 that have since been resolved. We are currently in version 4.1.9.0, which has many great changes.

    You can see more information about our releases here: https://github.com/broadinstitute/gatk/releases

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Marie Saitou

    Will try, thank you very much!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Candace Grimes,

    I'm looking at these screenshots and I'm not sure your issue is from the same cause, since the other two users said that they only had a header and no variants. It looks like there are variants in your file. So there may be a problem with your HaplotypeCaller or SelectVariants commands.

    Can you open a new post to look into that issue?

    Thank you,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Candace Grimes

    Yes, I will. Thank you!

    0
    Comment actions Permalink
  • Avatar
    Siddharth Prakash

    Hi!

    We are having the same issue with HC and Mutect2. I tried to follow this thread but never saw how you resolved the issue. Can someone repeat the answer or point me in the right direction?

    0
    Comment actions Permalink
  • 0
    Comment actions Permalink
  • Avatar
    Siddharth Prakash

    It's not what I'm looking for. My vcfs are truly empty, with just a header.

    My command:

    java -Xmx4g -jar ${GATK_DIR}/gatk-package-4.1.8.0-local.jar HaplotypeCaller -ERC GVCF -R $REF -I ${BAMDIR}/515010.bam --tmp-dir ${TMPDIR} -O ${OUTPUTDIR}/515010.a.vcf.gz --intervals $WORK2/references/xaa.bed

    The output:

    15:05:31.628 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/work2/03437/sprakash/lonestar/apps/gatk/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jun 21, 2021 3:05:32 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    15:05:32.497 INFO HaplotypeCaller - ------------------------------------------------------------
    15:05:32.497 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.1.8.0
    15:05:32.497 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
    15:05:32.497 INFO HaplotypeCaller - Executing as sprakash@c205-003.frontera.tacc.utexas.edu on Linux v3.10.0-1127.19.1.el7.x86_64 amd64
    15:05:32.498 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_262-b10
    15:05:32.498 INFO HaplotypeCaller - Start Date/Time: June 21, 2021 3:05:31 PM CDT
    15:05:32.498 INFO HaplotypeCaller - ------------------------------------------------------------
    15:05:32.498 INFO HaplotypeCaller - ------------------------------------------------------------
    15:05:32.498 INFO HaplotypeCaller - HTSJDK Version: 2.22.0
    15:05:32.498 INFO HaplotypeCaller - Picard Version: 2.22.8
    15:05:32.498 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    15:05:32.498 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    15:05:32.498 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    15:05:32.498 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    15:05:32.498 INFO HaplotypeCaller - Deflater: IntelDeflater
    15:05:32.498 INFO HaplotypeCaller - Inflater: IntelInflater
    15:05:32.498 INFO HaplotypeCaller - GCS max retries/reopens: 20
    15:05:32.498 INFO HaplotypeCaller - Requester pays: disabled
    15:05:32.498 INFO HaplotypeCaller - Initializing engine
    15:05:33.092 INFO FeatureManager - Using codec BEDCodec to read file file:///work2/03437/sprakash/lonestar/references/xaa.bed
    15:05:33.343 INFO IntervalArgumentCollection - Processing 6198806 bp from intervals
    15:05:33.394 INFO HaplotypeCaller - Done initializing engine
    15:05:33.395 INFO HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled
    15:05:33.413 INFO HaplotypeCallerEngine - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
    15:05:33.413 INFO HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
    15:05:33.424 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/work2/03437/sprakash/lonestar/apps/gatk/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
    15:05:33.456 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/work2/03437/sprakash/lonestar/apps/gatk/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
    15:05:33.546 INFO IntelPairHmm - Using CPU-supported AVX-512 instructions
    15:05:33.546 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
    15:05:33.546 INFO IntelPairHmm - Available threads: 1
    15:05:33.546 INFO IntelPairHmm - Requested threads: 4
    15:05:33.546 WARN IntelPairHmm - Using 1 available threads, but 4 were requested
    15:05:33.546 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
    15:05:33.571 INFO ProgressMeter - Starting traversal
    15:05:33.571 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
    15:05:43.571 INFO ProgressMeter - 1:7895855 0.2 2150 12900.0
    15:05:53.572 INFO ProgressMeter - 1:17599820 0.3 4630 13889.3
    15:06:03.602 INFO ProgressMeter - 1:27995085 0.5 7360 14704.8
    15:06:13.625 INFO ProgressMeter - 1:40661208 0.7 10240 15339.3
    15:06:23.626 INFO ProgressMeter - 1:53723984 0.8 13000 15582.9
    15:06:33.645 INFO ProgressMeter - 1:91297263 1.0 16110 16090.2
    15:06:43.661 INFO ProgressMeter - 1:117127741 1.2 19190 16427.7
    15:06:53.729 INFO ProgressMeter - 1:152282107 1.3 22310 16699.5
    15:07:03.754 INFO ProgressMeter - 1:157789896 1.5 24840 16526.4
    15:07:13.766 INFO ProgressMeter - 1:174670118 1.7 27640 16551.7
    15:07:23.816 INFO ProgressMeter - 1:201982273 1.8 30520 16610.3
    15:07:33.822 INFO ProgressMeter - 1:222711985 2.0 33350 16640.2
    15:07:43.826 INFO ProgressMeter - 1:241846767 2.2 36230 16688.8
    15:07:50.370 INFO HaplotypeCaller - 684519 read(s) filtered by: MappingQualityReadFilter
    0 read(s) filtered by: MappingQualityAvailableReadFilter
    0 read(s) filtered by: MappedReadFilter
    12170 read(s) filtered by: NotSecondaryAlignmentReadFilter
    651896 read(s) filtered by: NotDuplicateReadFilter
    0 read(s) filtered by: PassesVendorQualityCheckReadFilter
    0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
    0 read(s) filtered by: GoodCigarReadFilter
    0 read(s) filtered by: WellformedReadFilter
    1348585 total reads filtered
    15:07:50.370 INFO ProgressMeter - 2:11332470 2.3 38024 16677.3
    15:07:50.370 INFO ProgressMeter - Traversal complete. Processed 38024 total regions in 2.3 minutes.
    15:07:50.383 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.0
    15:07:50.383 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.0
    15:07:50.383 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.00 sec
    15:07:50.383 INFO HaplotypeCaller - Shutting down engine
    [June 21, 2021 3:07:50 PM CDT] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 2.32 minutes.
    Runtime.totalMemory()=1734868992

    Essentially, all of my reads are getting filtered.

    I ran ValidateSamFile on the target .bam and got this:

    java -Xmx4g -jar ${GATK_DIR}/gatk-package-4.1.8.0-local.jar ValidateSamFile --INPUT $WORK2/apps/baf_analysis/515010.bam --MODE SUMMARY
    15:11:05.400 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/work2/03437/sprakash/lonestar/apps/gatk/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    [Mon Jun 21 15:11:05 CDT 2021] ValidateSamFile --INPUT /work2/03437/sprakash/lonestar/apps/baf_analysis/515010.bam --MODE SUMMARY --MAX_OUTPUT 100 --IGNORE_WARNINGS false --VALIDATE_INDEX true --INDEX_VALIDATION_STRINGENCY EXHAUSTIVE --IS_BISULFITE_SEQUENCED false --MAX_OPEN_TEMP_FILES 8000 --SKIP_MATE_VALIDATION false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    Jun 21, 2021 3:11:05 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    [Mon Jun 21 15:11:05 CDT 2021] Executing as sprakash@c205-003.frontera.tacc.utexas.edu on Linux 3.10.0-1127.19.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_262-b10; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.8.0
    WARNING 2021-06-21 15:11:05 ValidateSamFile NM validation cannot be performed without the reference. All other validations will still occur.
    INFO 2021-06-21 15:11:57 SamFileValidator Validated Read 10,000,000 records. Elapsed time: 00:00:51s. Time for last 10,000,000: 51s. Last read position: 1:197,234,413
    INFO 2021-06-21 15:12:50 SamFileValidator Validated Read 20,000,000 records. Elapsed time: 00:01:44s. Time for last 10,000,000: 52s. Last read position: 2:220,497,821
    INFO 2021-06-21 15:13:43 SamFileValidator Validated Read 30,000,000 records. Elapsed time: 00:02:38s. Time for last 10,000,000: 53s. Last read position: 4:68,919,521
    INFO 2021-06-21 15:14:37 SamFileValidator Validated Read 40,000,000 records. Elapsed time: 00:03:32s. Time for last 10,000,000: 54s. Last read position: 6:33,281,603
    INFO 2021-06-21 15:15:31 SamFileValidator Validated Read 50,000,000 records. Elapsed time: 00:04:25s. Time for last 10,000,000: 53s. Last read position: 7:149,076,451
    INFO 2021-06-21 15:16:26 SamFileValidator Validated Read 60,000,000 records. Elapsed time: 00:05:21s. Time for last 10,000,000: 55s. Last read position: 10:409,196
    INFO 2021-06-21 15:17:20 SamFileValidator Validated Read 70,000,000 records. Elapsed time: 00:06:14s. Time for last 10,000,000: 53s. Last read position: 11:87,030,402
    INFO 2021-06-21 15:18:15 SamFileValidator Validated Read 80,000,000 records. Elapsed time: 00:07:09s. Time for last 10,000,000: 54s. Last read position: 13:103,387,827
    INFO 2021-06-21 15:19:11 SamFileValidator Validated Read 90,000,000 records. Elapsed time: 00:08:06s. Time for last 10,000,000: 56s. Last read position: 16:3,652,424
    INFO 2021-06-21 15:20:05 SamFileValidator Validated Read 100,000,000 records. Elapsed time: 00:08:59s. Time for last 10,000,000: 53s. Last read position: 17:56,272,511
    INFO 2021-06-21 15:21:00 SamFileValidator Validated Read 110,000,000 records. Elapsed time: 00:09:54s. Time for last 10,000,000: 55s. Last read position: 19:45,377,091
    INFO 2021-06-21 15:21:57 SamFileValidator Validated Read 120,000,000 records. Elapsed time: 00:10:52s. Time for last 10,000,000: 57s. Last read position: X:591,820


    ## HISTOGRAM java.lang.String
    Error Type Count
    ERROR:INVALID_PLATFORM_VALUE 2
    ERROR:MATES_ARE_SAME_END 660
    ERROR:MISMATCH_FLAG_MATE_NEG_STRAND 948
    ERROR:MISMATCH_FLAG_MATE_UNMAPPED 576
    ERROR:MISMATCH_MATE_CIGAR_STRING 948

    [Mon Jun 21 15:23:18 CDT 2021] picard.sam.ValidateSamFile done. Elapsed time: 12.22 minutes.
    Runtime.totalMemory()=1766850560
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    Tool returned:
    3

    Any suggestions?

    0
    Comment actions Permalink
  • Avatar
    Siddharth Prakash

    Follow up:

    I checked the .bam file that I used as target for the previous command in igv. This .bam was created from paired end fastq files using the recommended pipeline in GATK 4.1.8, which I had used previously with success. However, the visualized alignment makes no sense. This explains why the sequences were filtered. What is going on? Did my bwa step fail?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Siddharth Prakash yes, I would recommend going back to your alignment and pre-processing steps to check for errors there. Make sure you are keeping the reference consistent!

    0
    Comment actions Permalink
  • Avatar
    Siddharth Prakash

    I went back to my alignment and preprocessing steps and found no errors. I confirmed that my reference hs37d5 is consistent. This is my workflow:

    module use /work2/03437/sprakash/lonestar/apps/modulefiles; module load bwa/ctr-0.7.17--pl5.22.0_2;module load tacc-singularity
    /3.7.2;module load cutadapt/ctr-3.1--py37h14c3975_1; java -Xmx8g -jar /work2/03437/sprakash/lonestar/apps/gatk/gatk-package-4.1
    .8.0-local.jar FastqToSam -F1 /corral-secure/uth/Sex-Chromosome-Loss/BGI/fastq/511458/V300087608_L02_HUMftlX009649-670_1.fq.gz
    -F2 /corral-secure/uth/Sex-Chromosome-Loss/BGI/fastq/511458/V300087608_L02_HUMftlX009649-670_2.fq.gz --TMP_DIR /scratch1/03437/
    sprakash/tmp -SM 511458 -RG 670 -O /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.unmapped.
    bam; cutadapt -a AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA -A AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG --minimum-length 1 --buffer-siz
    e=10000000 --interleaved -u 7 -U 7 -j 0 -o /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.c
    ut.fq.gz /corral-secure/uth/Sex-Chromosome-Loss/BGI/fastq/511458/V300087608_L02_HUMftlX009649-670_1.fq.gz /corral-secure/uth/Se
    x-Chromosome-Loss/BGI/fastq/511458/V300087608_L02_HUMftlX009649-670_2.fq.gz; bwa mem -p -M -t 136 /work2/03437/sprakash/lonesta
    r/references/hs37d5.fa /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.cut.fq.gz > /corral-s
    ecure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.aligned.sam; java -Xmx8g -jar /work2/03437/sprakash/l
    onestar/apps/gatk/gatk-package-4.1.8.0-local.jar MergeBamAlignment -ALIGNED /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V
    300087608_L02_HUMftlX009649-670.aligned.sam -UNMAPPED /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX0
    09649-670.unmapped.bam --TMP_DIR /scratch1/03437/sprakash/tmp -R /work2/03437/sprakash/lonestar/references/hs37d5.fa -CREATE_IN
    DEX true -O /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.merged.bam; java -Xmx8g -jar /wo
    rk2/03437/sprakash/lonestar/apps/gatk/gatk-package-4.1.8.0-local.jar AddOrReplaceReadGroups -I /corral-secure/uth/Sex-Chromosom
    e-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.merged.bam --RGID 670 --RGLB HUMftlX009649 --RGPL NIMBLEGEN --RGPU V30008760
    8 --RGSM 511458 --TMP_DIR /scratch1/03437/sprakash/tmp -O /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMf
    tlX009649-670.fix_read_group.bam; java -Xmx8g -jar /work2/03437/sprakash/lonestar/apps/gatk/gatk-package-4.1.8.0-local.jar Mark
    Duplicates -I /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.fix_read_group.bam -M /corral-
    secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.marked_dup_metrics.txt --TMP_DIR /scratch1/03437/spr
    akash/tmp -O /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.marked_dup.bam; rm /corral-secu
    re/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.fix_read_group.bam; java -Xmx8g -jar /work2/03437/spraka
    sh/lonestar/apps/gatk/gatk-package-4.1.8.0-local.jar SortSam -I /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L0
    2_HUMftlX009649-670.marked_dup.bam --SORT_ORDER coordinate --TMP_DIR /scratch1/03437/sprakash/tmp -O /corral-secure/uth/Sex-Chr
    omosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.sorted.bam; java -Xmx8g -jar /work2/03437/sprakash/lonestar/apps/gatk/
    gatk-package-4.1.8.0-local.jar BaseRecalibrator -I /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX0096
    49-670.sorted.bam -R /work2/03437/sprakash/lonestar/references/hs37d5.fa --known-sites /work2/03437/sprakash/lonestar/reference
    s/dbSNP.151.vcf.gz --tmp-dir /scratch1/03437/sprakash/tmp -O /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_H
    UMftlX009649-670.recal.table; TMPDIR=/scratch1/03437/sprakash/tmp java -Xmx8g -jar /work2/03437/sprakash/lonestar/apps/gatk/gat
    k-package-4.1.8.0-local.jar ApplyBQSR -I /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.sor
    ted.bam -R /work2/03437/sprakash/lonestar/references/hs37d5.fa --bqsr-recal-file /corral-secure/uth/Sex-Chromosome-Loss/BGI/out
    put/V300087608_L02_HUMftlX009649-670.recal.table -O /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009
    649-670.bam

    I also viewed the header of the aberrant .bam file. I can't see anything unusual:

    @SQ     SN:1    LN:249250621    M5:1b22b98cdeb4a9304cb5d48026a85128     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:2    LN:243199373    M5:a0d9851da00400dec1098a9255ac712e     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:3    LN:198022430    M5:fdfd811849cc2fadebc929bb925902e5     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:4    LN:191154276    M5:23dccd106897542ad87d2765d28a19a1     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:5    LN:180915260    M5:0740173db9ffd264d728f32784845cd7     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:6    LN:171115067    M5:1d3a93a248d92a729ee764823acbbc6b     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:7    LN:159138663    M5:618366e953d6aaad97dbe4777c29375e     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:8    LN:146364022    M5:96f514a9929e410c6651697bded59aec     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:9    LN:141213431    M5:3e273117f15e0a400f01055d9f393768     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:10   LN:135534747    M5:988c28e000e84c26d552359af1ea2e1d     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:11   LN:135006516    M5:98c59049a2df285c76ffb1c6db8f8b96     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:12   LN:133851895    M5:51851ac0e1a115847ad36449b0015864     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:13   LN:115169878    M5:283f8d7892baa81b510a015719ca7b0b     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:14   LN:107349540    M5:98f3cae32b2a2e9524bc19813927542e     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:15   LN:102531392    M5:e5645a794a8238215b2cd77acb95a078     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:16   LN:90354753     M5:fc9b1a7b42b97a864f56b348b06095e6     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:17   LN:81195210     M5:351f64d4f4f9ddd45b35336ad97aa6de     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:18   LN:78077248     M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:19   LN:59128983     M5:1aacd71f30db8e561810913e0b72636d     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:20   LN:63025520     M5:0dec9660ec1efaaf33281c0d5ea2560f     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:21   LN:48129895     M5:2979a6085bfe28e3ad6f552f361ed74d     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:22   LN:51304566     M5:a718acaa6135fdca8357d5bfe94211dd     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:X    LN:155270560    M5:7e0e2e580297b7764e31dbc80c2540dd     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:Y    LN:59373566     M5:1fa3474750af0948bdf97d5a0ee52e51     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:MT   LN:16569        M5:c68f52674c9fb33aef52dcf399755519     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000207.1   LN:4262 M5:f3814841f1939d3ca19072d9e89f3fd7     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000226.1   LN:15008        M5:1c1b2cd1fccbc0a99b6a447fa24d1504     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000229.1   LN:19913        M5:d0f40ec87de311d8e715b52e4c7062e1     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000231.1   LN:27386        M5:ba8882ce3a1efa2080e5d29b956568a4     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000210.1   LN:27682        M5:851106a74238044126131ce2a8e5847c     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000239.1   LN:33824        M5:99795f15702caec4fa1c4e15f8a29c07     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000235.1   LN:34474        M5:118a25ca210cfbcdfb6c2ebb249f9680     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000201.1   LN:36148        M5:dfb7e7ec60ffdcb85cb359ea28454ee9     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000247.1   LN:36422        M5:7de00226bb7df1c57276ca6baabafd15     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000245.1   LN:36651        M5:89bc61960f37d94abf0df2d481ada0ec     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000197.1   LN:37175        M5:6f5efdd36643a9b8c8ccad6f2f1edc7b     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000203.1   LN:37498        M5:96358c325fe0e70bee73436e8bb14dbd     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000246.1   LN:38154        M5:e4afcd31912af9d9c2546acf1cb23af2     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000249.1   LN:38502        M5:1d78abec37c15fe29a275eb08d5af236     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000196.1   LN:38914        M5:d92206d1bb4c3b4019c43c0875c06dc0     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000248.1   LN:39786        M5:5a8e43bec9be36c7b49c84d585107776     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000244.1   LN:39929        M5:0996b4475f353ca98bacb756ac479140     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000238.1   LN:39939        M5:131b1efc3270cc838686b54e7c34b17b     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000202.1   LN:40103        M5:06cbf126247d89664a4faebad130fe9c     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000234.1   LN:40531        M5:93f998536b61a56fd0ff47322a911d4b     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000232.1   LN:40652        M5:3e06b6741061ad93a8587531307057d8     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000206.1   LN:41001        M5:43f69e423533e948bfae5ce1d45bd3f1     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000240.1   LN:41933        M5:445a86173da9f237d7bcf41c6cb8cc62     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000236.1   LN:41934        M5:fdcd739913efa1fdc64b6c0cd7016779     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000241.1   LN:42152        M5:ef4258cdc5a45c206cea8fc3e1d858cf     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000243.1   LN:43341        M5:cc34279a7e353136741c9fce79bc4396     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000242.1   LN:43523        M5:2f8694fc47576bc81b5fe9e7de0ba49e     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000230.1   LN:43691        M5:b4eb71ee878d3706246b7c1dbef69299     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000237.1   LN:45867        M5:e0c82e7751df73f4f6d0ed30cdc853c0     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000233.1   LN:45941        M5:7fed60298a8d62ff808b74b6ce820001     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000204.1   LN:81310        M5:efc49c871536fa8d79cb0a06fa739722     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000198.1   LN:90085        M5:868e7784040da90d900d2d1b667a1383     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000208.1   LN:92689        M5:aa81be49bf3fe63a79bdc6a6f279abf6     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000191.1   LN:106433       M5:d75b436f50a8214ee9c2a51d30b2c2cc     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000227.1   LN:128374       M5:a4aead23f8053f2655e468bcc6ecdceb     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000228.1   LN:129120       M5:c5a17c97e2c1a0b6a9cc5a6b064b714f     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000214.1   LN:137718       M5:46c2032c37f2ed899eb41c0473319a69     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000221.1   LN:155397       M5:3238fb74ea87ae857f9c7508d315babb     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000209.1   LN:159169       M5:f40598e2a5a6b26e84a3775e0d1e2c81     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000218.1   LN:161147       M5:1d708b54644c26c7e01c2dad5426d38c     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000220.1   LN:161802       M5:fc35de963c57bf7648429e6454f1c9db     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000213.1   LN:164239       M5:9d424fdcc98866650b58f004080a992a     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000211.1   LN:166566       M5:7daaa45c66b288847b9b32b964e623d3     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000199.1   LN:169874       M5:569af3b73522fab4b40995ae4944e78e     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000217.1   LN:172149       M5:6d243e18dea1945fb7f2517615b8f52e     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000216.1   LN:172294       M5:642a232d91c486ac339263820aef7fe0     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000215.1   LN:172545       M5:5eb3b418480ae67a997957c909375a73     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000205.1   LN:174588       M5:d22441398d99caf673e9afb9a1908ec5     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000219.1   LN:179198       M5:f977edd13bac459cb2ed4a5457dba1b3     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000224.1   LN:179693       M5:d5b2fc04f6b41b212a4198a07f450e20     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000223.1   LN:180455       M5:399dfa03bf32022ab52a846f7ca35b30     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000195.1   LN:182896       M5:5d9ec007868d517e73543b005ba48535     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000212.1   LN:186858       M5:563531689f3dbd691331fd6c5730a88b     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000222.1   LN:186861       M5:6fe9abac455169f50470f5a6b01d0f59     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000200.1   LN:187035       M5:75e4c8d17cd4addf3917d1703cacaf25     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000193.1   LN:189789       M5:dbb6e8ece0b5de29da56601613007c2a     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000194.1   LN:191469       M5:6ac8f815bf8e845bb3031b73f812c012     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000225.1   LN:211173       M5:63945c3e6962f28ffd469719a747e73c     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:GL000192.1   LN:547496       M5:325ba9e808f669dfeee210fdd7b470ac     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:NC_007605    LN:171823       M5:6743bd63b3ff2b5b8985d8933c53290a     UR:file:C:\GATK\hs37d5.fa

    @SQ     SN:hs37d5       LN:35477943     M5:5b6a4b3a81a2d3c134b7d14bf6ad39f1     UR:file:C:\GATK\hs37d5.fa

    @RG     ID:670  LB:HUMftlX009649        PL:NIMBLEGEN    SM:511458       PU:V300087608

    @PG     ID:bwa  PN:bwa  VN:0.7.17-r1188 CL:/usr/local/bin/bwa mem -p -M -t 136 /work2/03437/sprakash/lonestar/references/hs37d5.fa /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L01_HUMftlX009649-670.cut.fq.gz

    @PG     ID:MarkDuplicates       VN:Version:4.1.8.0      CL:MarkDuplicates --INPUT /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L01_HUMftlX009649-670.fix_read_group.bam --OUTPUT /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L01_HUMftlX009649-670.marked_dup.bam --METRICS_FILE /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L01_HUMftlX009649-670.marked_dup_metrics.txt --TMP_DIR /scratch1/03437/sprakash/tmp --MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP 50000 --MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 8000 --SORTING_COLLECTION_SIZE_RATIO 0.25 --TAG_DUPLICATE_SET_MEMBERS false --REMOVE_SEQUENCING_DUPLICATES false --TAGGING_POLICY DontTag --CLEAR_DT true --DUPLEX_UMI false --ADD_PG_TAG_TO_READS true --REMOVE_DUPLICATES false --ASSUME_SORTED false --DUPLICATE_SCORING_STRATEGY SUM_OF_BASE_QUALITIES --PROGRAM_RECORD_ID MarkDuplicates --PROGRAM_GROUP_NAME MarkDuplicates --READ_NAME_REGEX <optimized capture of last three ':' separated fields as numeric values> --OPTICAL_DUPLICATE_PIXEL_DISTANCE 100 --MAX_OPTICAL_DUPLICATE_SET_SIZE 300000 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false      PN:MarkDuplicates

           PP:bwa

    @PG     ID:GATK ApplyBQSR       VN:4.1.8.0      CL:ApplyBQSR --output /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L01_HUMftlX009649-670.bam --bqsr-recal-file /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L01_HUMftlX009649-670.recal.table --input /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L01_HUMftlX009649-670.sorted.bam --reference /work2/03437/sprakash/lonestar/references/hs37d5.fa --preserve-qscores-less-than 6 --use-original-qualities false --quantize-quals 0 --round-down-quantized false --emit-original-quals false --global-qscore-prior -1.0 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays  --disable-tool-default-read-filters false

    What do you suggest?

     

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Siddharth Prakash is your alignment still looking similar to the image you shared above? If so, you will not be able to get results with HaplotypeCaller.

    You can take a look at your bam/sam file before and after each pre-processing step in IGV to figure out when the alignment starts to have issues.

    0
    Comment actions Permalink
  • Avatar
    Siddharth Prakash

    Hi Genevieve,

    Yes, I reran the pipeline and checked the preprocessed bams.  They all look the same as what I posted. I'm calling bwa/0.7.17. Any suggestions?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    You are using -p argument with bwa mem which:

    Assume the first input query file is interleaved paired-end FASTA/Q. See the command description for details.

    Is your file /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.cut.fq.gz properly interleaved paired-end reads? I noticed it has "cut" in the name, is it a subset of the reads? If so, that could be how you lost the mates.

    0
    Comment actions Permalink
  • Avatar
    Siddharth Prakash

    Yes, my output is interleaved paired-end. Output is from cutadapt 3.1.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    For some reason your reads are not aligning properly with bwa mem and you'll have to look closer into your data to determine where this issue is coming from.

    0
    Comment actions Permalink
  • Avatar
    Siddharth Prakash

    I am stuck for two reasons:

    1. When I ran this command 6 months ago it worked just fine. I used bwa 0.7.16 instead of 0.7.17 then. I just reran the same .fastq files that I had successfully aligned earlier and got the same mess of an output.

    2. I don't know how to troubleshoot the issue other than to go back to bwa 0.7.16. Do you have any suggestions?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    We just provide support for GATK issues on this forum, since this sounds like a bwa issue I would recommend reaching out to the bwa developers. You could also post this on biostars

    I'll see if anyone knows anything about this issue in my team but I can't guarantee I'll be able to provide answers.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk