Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK MarkDuplicates Missing output file

1

5 comments

  • Avatar
    Gökalp Çelik

    Hi Gilgamesh

    Can you check if your input bam file has any reads in it? This log looks like there is nothing inside therefore the tool stopped working immediately after running. Also are those named output folders are readily available before the tool starts running?

     

    0
    Comment actions Permalink
  • Avatar
    Gilgamesh

    Hi Gökalp,

     

    Thanks for the reply, the input file is 115 KB, and looks right. Also the paths are functioning and read right in the debug window.

     

    Speaking of the input file, here are the first few lines and then a break to lines 2500

    BAMÈ∞ @HD    VN:1.6    SO:coordinate
    @SQ    SN:Ha412HOChr01    LN:159217232
    @SQ    SN:Ha412HOChr02    LN:184765313

    @SQ    SN:Ha412HOChr00c25016    LN:
    @PG    ID:samtools    PN:samtools    VN:1.15    CL:samtools sort -o /mmfs1/projects/super.visor/RSSW_paper/RSSW_original_fastq/provingGrounds/bam_sorted/RSSW_15_0043_trimmed_bwaMem_readGroups_sorted.bam -O BAM -T tmp -
    …a Ha412HOChr01Pv}     Ha412HOChr02ÅK  Ha412HOChr03ÜÚ⁄
    Ha412HOChr04`U: Ha412HOChr05s¥+  Ha412HOChr06ÅÉc     Ha412HOChr07»ø®     Ha412HOChr08,õ
    Ha412HOChr09ê£Ô  Ha412HOChr10ï1a  Ha412HOChr11p…„  Ha412HOChr12Üu´
    Ha412HOChr13i≤x

    Here is the code for the prior step, it should be working. 

      bwa mem -t 16 -R '@RG\tID:RSSW\tSM:RSSW_15_0048\tPL:ILLUMINA' Ha412HOv2.0-20181130.fasta $OUTPUT_DIR/trimmed/${SAMPLE_NAME}_trimmed.fastq |\
    samtools sort -o $OUTPUT_DIR/bam_sorted/${SAMPLE_NAME}_trimmed_bwaMem_readGroups_sorted.bam -O BAM -T tmp -
    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Gilgamesh

    Can you run ValidateSamFile on your input bam? Since the bam file is too small is it possible for you to share one of them so we can try replicating the problem?

    0
    Comment actions Permalink
  • Avatar
    Gilgamesh

    Hi Gökalp,

    It should be noted that I am getting a core dump midway between the running of the BWA command with 32 threads and 128 gb memory. So the issue seems to be unreplicable, interestingly, the piping stops the core dump error but doesn't stop BWA from crashing, the core dump occurs at iteration 25016, and the piping to @PG    ID:samtools    PN:samtools    VN:1.15    CL:samtools sort -o also occurred at that same line.

    See the following info from the ValidateSamFile.

    09:08:17.420 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mmfs1/apps/spack/0.16.1/linux-rhel8-zen2/gcc-10.2.0/gatk-4.1.8.1-2onk7tepebfsn3qaco7vyj5d2eiogkbn/bin/gatk-package-4.1.8.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
    [Thu Aug 31 09:08:17 CDT 2023] ValidateSamFile --INPUT /mmfs1/projects/brent.hulke/RSSW_paper/RSSW_original_fastq/provingGrounds/bam_sorted/RSSW_15_0043_trimmed_bwaMem_readGroups_sorted.bam --MODE SUMMARY --MAX_OUTPUT 100 --IGNORE_WARNINGS false --VALIDATE_INDEX true --INDEX_VALIDATION_STRINGENCY EXHAUSTIVE --IS_BISULFITE_SEQUENCED false --MAX_OPEN_TEMP_FILES 8000 --SKIP_MATE_VALIDATION false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    Aug 31, 2023 9:08:17 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    [Thu Aug 31 09:08:17 CDT 2023] Executing as ethan.risman@cmp0023 on Linux 4.18.0-348.12.2.el8_5.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_322-b06; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.8.1
    WARNING    2023-08-31 09:08:17    ValidateSamFile    NM validation cannot be performed without the reference. All other validations will still occur.
    ERROR    2023-08-31 09:08:17    ValidateSamFile    Number of sequences in text header (25032) != number of sequences in binary header (25033) for file /mmfs1/projects/brent.hulke/RSSW_paper/RSSW_original_fastq/provingGrounds/bam_sorted/RSSW_15_0043_trimmed_bwaMem_readGroups_sorted.bam
    [Thu Aug 31 09:08:17 CDT 2023] picard.sam.ValidateSamFile done. Elapsed time: 0.00 minutes.
    Runtime.totalMemory()=2293760000
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    Tool returned:
    -1

    Also, I can certainly provide the file. Would you like it copy pasted here? Or emailed somewhere?

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Gilgamesh

    The coredump explains the issue. Your sam file is corrupt halfway therefore you get an incomplete file to begin with. It may be better if you check your fastq files as well since problem could be related to them being corrupt as well. If not it may be better to try with reduced number of threads for bwa. Also you may try with a non-piped workflow to see which step is really problematic. Also to make things more consistent can you add 

    set -o pipefail

    before you start your pipe. This results in a core dump to stop both parts of the pipe. 

    I hope this helps. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk