Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Error in SplitNCigarReads

0

11 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Giulia Corsi, this looks like it may be an issue with your input bam. We have a tool that can give you a better idea of the problem: could you run ValidateSamFile on your input bam? https://gatk.broadinstitute.org/hc/en-us/articles/360035891231

    0
    Comment actions Permalink
  • Avatar
    Giulia Corsi

    Dear Genevieve,

    thank you for your reply. The only error I see with ValidateSamFile is about the PL argument not being settled, but I get it also for other samples for which I do not have problems with SplitNCigarReads. Am I missing something else?

     

    Below there is the output of the command:

    picard ValidateSamFile I=SOD1P_A272C_rep2.Dedup.bam MODE=SUMMARY

    INFO 2020-08-20 17:36:28 ValidateSamFile

    ********** NOTE: Picard's command line syntax is changing.
    **********
    ********** For more information, please see:
    ********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
    **********
    ********** The command line looks like this in the new syntax:
    **********
    ********** ValidateSamFile -I SOD1P_A272C_rep2.Dedup.bam -MODE SUMMARY
    **********


    17:36:36.404 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/results/SOD1/.snakemake/conda/24337889/share/picard-2.23.0-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Thu Aug 20 17:36:36 CEST 2020] ValidateSamFile INPUT=SOD1P_A272C_rep2.Dedup.bam MODE=SUMMARY MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 SKIP_MATE_VALIDATION=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
    [Thu Aug 20 17:36:36 CEST 2020] Executing as giulia@### on Linux 2.6.32-754.31.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 11.0.1+13-LTS; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.23.0
    WARNING 2020-08-20 17:36:36 ValidateSamFile NM validation cannot be performed without the reference. All other validations will still occur.
    INFO 2020-08-20 17:38:04 SamFileValidator Validated Read 10,000,000 records. Elapsed time: 00:01:27s. Time for last 10,000,000: 87s. Last read position: chr19:716,787
    INFO 2020-08-20 17:39:31 SamFileValidator Validated Read 20,000,000 records. Elapsed time: 00:02:54s. Time for last 10,000,000: 87s. Last read position: chr4:54,006,678
    INFO 2020-08-20 17:40:57 SamFileValidator Validated Read 30,000,000 records. Elapsed time: 00:04:20s. Time for last 10,000,000: 85s. Last read position: */*


    ## HISTOGRAM java.lang.String
    Error Type Count
    ERROR:MISSING_PLATFORM_VALUE 1

    [Thu Aug 20 17:41:34 CEST 2020] picard.sam.ValidateSamFile done. Elapsed time: 4.97 minutes.
    Runtime.totalMemory()=536870912
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp

     

    Below you can find the output using MODE=VERBOSE IGNORE_WARNINGS=true:

     

    INFO 2020-08-20 17:48:17 ValidateSamFile

    ********** NOTE: Picard's command line syntax is changing.
    **********
    ********** For more information, please see:
    ********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
    **********
    ********** The command line looks like this in the new syntax:
    **********
    ********** ValidateSamFile -I SOD1P_A272C_rep2.Dedup.bam -MODE VERBOSE -IGNORE_WARNINGS true
    **********


    17:48:25.445 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/results/SOD1/.snakemake/conda/24337889/share/picard-2.23.0-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Thu Aug 20 17:48:25 CEST 2020] ValidateSamFile INPUT=SOD1P_A272C_rep2.Dedup.bam MODE=VERBOSE IGNORE_WARNINGS=true MAX_OUTPUT=100 VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 SKIP_MATE_VALIDATION=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
    [Thu Aug 20 17:48:25 CEST 2020] Executing as giulia@### on Linux 2.6.32-754.31.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 11.0.1+13-LTS; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.23.0
    WARNING 2020-08-20 17:48:25 ValidateSamFile NM validation cannot be performed without the reference. All other validations will still occur.
    ERROR::MISSING_PLATFORM_VALUE:Read name SOD1P_A272C_rep2, A platform (PL) attribute was not found for read group
    INFO 2020-08-20 17:49:53 SamFileValidator Validated Read 10,000,000 records. Elapsed time: 00:01:26s. Time for last 10,000,000: 86s. Last read position: chr19:716,787
    INFO 2020-08-20 17:51:20 SamFileValidator Validated Read 20,000,000 records. Elapsed time: 00:02:53s. Time for last 10,000,000: 86s. Last read position: chr4:54,006,678
    INFO 2020-08-20 17:52:44 SamFileValidator Validated Read 30,000,000 records. Elapsed time: 00:04:18s. Time for last 10,000,000: 84s. Last read position: */*
    [Thu Aug 20 17:53:21 CEST 2020] picard.sam.ValidateSamFile done. Elapsed time: 4.93 minutes.
    Runtime.totalMemory()=536870912
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp

     

    Best regards,

    Giulia

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Giulia Corsi, thank you for running it, however, could you run it in the default VERBOSE mode so that we can confirm there are no issues? You can also include your reference with the option -R to check all issues.

    0
    Comment actions Permalink
  • Avatar
    Giulia Corsi

    Dear Genevieve,

    than you for the indications.

    I re-run picard including the reference genome as follows:

    picard ValidateSamFile I=SOD1P_A272C_rep2.Dedup.bam MO=1000 R=hg38_primary_refseq.fa

    The output is below:

     

    INFO 2020-08-24 16:36:12 ValidateSamFile

    ********** NOTE: Picard's command line syntax is changing.
    **********
    ********** For more information, please see:
    ********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
    **********
    ********** The command line looks like this in the new syntax:
    **********
    ********** ValidateSamFile -I SOD1P_A272C_rep2.Dedup.bam -MO 1000 -R hg38_primary_refseq.fa
    **********


    16:36:12.918 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/giulia/anaconda3/envs/picard/share/picard-2.23.0-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so
    [Mon Aug 24 16:36:12 CEST 2020] ValidateSamFile INPUT=SOD1P_A272C_rep2.Dedup.bam MAX_OUTPUT=1000 REFERENCE_SEQUENCE=/hg38_primary_refseq.fa MODE=VERBOSE IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 SKIP_MATE_VALIDATION=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false

    [Mon Aug 24 16:36:12 CEST 2020] Executing as giulia@sysadm-Latitude-7480 on Linux 5.4.0-42-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_152-release-1056-b12; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.23.0
    ERROR::MISSING_PLATFORM_VALUE:Read name SOD1P_A272C_rep2, A platform (PL) attribute was not found for read group
    INFO 2020-08-24 16:42:21 SamFileValidator Validated Read 10,000,000 records. Elapsed time: 00:06:08s. Time for last 10,000,000: 335s. Last read position: chr19:716,787
    INFO 2020-08-24 16:48:33 SamFileValidator Validated Read 20,000,000 records. Elapsed time: 00:12:19s. Time for last 10,000,000: 371s. Last read position: chr4:54,006,678
    INFO 2020-08-24 16:54:29 SamFileValidator Validated Read 30,000,000 records. Elapsed time: 00:18:16s. Time for last 10,000,000: 356s. Last read position: */*
    [Mon Aug 24 16:55:59 CEST 2020] picard.sam.ValidateSamFile done. Elapsed time: 19.78 minutes.
    Runtime.totalMemory()=821035008
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi Giulia Corsi

    There seems to be two seperate issues here:

    1. Error with ValidateSamFile. This is due to incorrect read groups. This is probably not the reason for your error with SplitNCigarReads but this could become an issue in the later steps of your analysis with GATK. To fix that please use AddOrReplaceReadGroups tool add PL(Platform) tag to the read groups values. See these docs for more information: https://gatk.broadinstitute.org/hc/en-us/articles/360035890671  https://gatk.broadinstitute.org/hc/en-us/articles/360035532352-Errors-about-read-group-RG-information https://gatk.broadinstitute.org/hc/en-us/articles/360037872491--How-to-Fix-a-badly-formatted-BAM
    2. Error with SplitNCigarReads. This looks like a possible bug to me. Let me confirm whit my team and get back to you.
    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Giulia Corsi

     

    We think the issue with SplitNCigarReads is a bug in our code. Thanks for bringing this to our attention. We have created a issue ticket to fix this and you can follow its progress here: https://github.com/broadinstitute/gatk/issues/6776

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi Giulia Corsi

     

    Can you please provide us with the  a runnable test case that reproduces the SplitNCigarReads issue? Once we have that, we can debug.

    1
    Comment actions Permalink
  • Avatar
    Giulia Corsi

    Dear Bhanu, 

    thank you for taking care of this. I could not identify a subset of the BAM file where the problem is happening, so I have shared one of the problematic BAM files entirely (the smallest) and its md5sum at: 

    https://drive.google.com/drive/folders/1a6q_c7xNlsFqG1B3TdrdVL51_QB5e7RD?usp=sharing

    This bam file fails with the same error as the previous one with gatk-4.1.8.1 SplitNCigarReads.

    The original raw data is publicly available in the SRA: SRR5273292.

    Best,

    Giulia

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Giulia Corsi

     

    Thanks for sharing the data. I have updated the ticket and you can follow the progress of this fix here: https://github.com/broadinstitute/gatk/issues/6776

    0
    Comment actions Permalink
  • Avatar
    Tayyaba Alvi

    Bhanu Gandham

     

    I am getting the same error with RNA-seq samples. I have total 9 out of which 5 ran successfully and the other 4 terminating giving the same error. I checked my bam files and apparently there doesn't seem any problem with them. Now I see that there is bug in the code, I followed the issue on github but couldn't grasp a solution.

    Can you help me with it?

    Thank you

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Tayyaba Alvi, it looks like the issue has been solved but we haven't released a new version of GATK since then. If you are using GATK on docker, you can use the nightly release for the latest GATK changes. You can also build GATK from the master branch on github. 

    We are planning to release a new version of GATK in January. The change will be available in that release.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk