Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

ValidateSamFile behavior

1

7 comments

  • Avatar
    Bhanu Gandham

    HI Robert Edgar

     

    You are using a very old version of GATK that we don't support anymore. Please upgrade to the latest GATK4.1.1.0 version.

    -1
    Comment actions Permalink
  • Avatar
    Robert Edgar

    Yesterday I did a git clone and build per instructions here: https://github.com/broadinstitute/picard

    Version shows as this:

    java -jar $picard ValidateSamFile --version
    2.21.8-1-gc5cd747-SNAPSHOT

    Where do I get a supported releases?
    Thanks, Robert.

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi Robert Edgar

     

    Apologies. You mentioned GATK2 which got me confused. Anyway, you are using is Picard v2 so that's fine. 

    Now lets answer your questions:

    1. Yes you can provide a reference fasta using `-R` argument. You can see this in the --help information for ValidateSamFile under the "Optional Common Arguments" section.
    2. GATK requires read group data and fails without it. See this doc for more info.
    1
    Comment actions Permalink
  • Avatar
    Robert Bremel

    ValidateSamFile is generating the same  warning with 4.1.7.0 Docker version.  How can it be fixed?  Should it be?    It also generates an error

     

    MISSING_PLATFORM_VALUE:Read name A, A platform (PL) attribute was not found for read group  

    After I had run

    AddOrReplaceReadGroups Created read-group ID=1 PL=ILLUMINA LB=normal_1 SM=CTGCTTCC+GATAGATC

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi Robert,

     

    Are you seeing this error even after adding the read groups to the bam file? That should not happen. Can you please share the header of the bam file using this command:

    samtools view -H <bamfile>
    0
    Comment actions Permalink
  • Avatar
    Dhara Awasthi

    Hi,

    I tried the ValidateSam command to check if my bam file was appropriate. I added Readgroups and it showed some NM validation warning. I tried this command-

    java -jar picard.jar ValidateSamFile R= genome.fa I=SRR314128_rg.bam MODE=SUMMARY

    But it still shows this error-

    WARNING:MISSING_TAG_NM 33753200

    I am also pasting the header of my bam file here:

    @HD VN:1.6 SO:coordinate
    @SQ SN:1 LN:30427671
    @SQ SN:2 LN:19698289
    @SQ SN:3 LN:23459830
    @SQ SN:4 LN:18585056
    @SQ SN:5 LN:26975502
    @RG ID:foo LB:bar PL:illumina SM:Sample1 PU:A123.1
    @PG ID:STAR PN:STAR VN:STAR_2.5.0a CL:STAR --runThreadN 12 --genomeDir genome/ --readFilesIn SRR3141288/SRR3141288_1.fastq SRR3141288/SRR3141288_2.fastq --outFileNamePrefix ../SRR3141288
    @PG ID:MarkDuplicates VN:2.22.1 CL:MarkDuplicates INPUT=[SRR3141288.sorted.bam] OUTPUT=SRR3141288_md.bam METRICS_FILE=marked_dup_metrics.txt MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false PN:MarkDuplicates
    @PG ID:samtools PN:samtools PP:STAR VN:1.10 CL:samtools view -H SRR314128_rg.bam
    @PG ID:samtools.1 PN:samtools PP:MarkDuplicates VN:1.10 CL:samtools view -H SRR314128_rg.bam
    @CO user command line: STAR --genomeDir genome/ --runThreadN 12 --readFilesIn SRR3141288/SRR3141288_1.fastq SRR3141288/SRR3141288_2.fastq --outFileNamePrefix ../SRR3141288

    WHAT COULD POSSIBLY BE WRONG?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hello Dhara Awasthi

    Please see this resource we have for diagnosing issues that come up from ValidateSamFile: https://gatk.broadinstitute.org/hc/en-us/articles/360035891231-Errors-in-SAM-or-BAM-files-can-be-diagnosed-with-ValidateSamFile

    Hope this helps!

    Genevieve

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk