Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

SamToFastq (Picard) Follow

2 comments

  • Avatar
    Alex Falconer

    # unavailable link https://gatkforums.broadinstitute.org/gatk/discussion/6484/how-to-generate-an-unmapped-bam-from-fastq-or-aligned-bam

    (A) Convert FASTQ to uBAM and add read group information using FastqToSam

    Picard's FastqToSam transforms a FASTQ file to an unmapped BAM, requires two read group fields and makes optional specification of other read group fields. In the command below we note which fields are required for GATK Best Practices Workflows. All other read group fields are optional.

    java -Xmx8G -jar picard.jar FastqToSam \
        FASTQ=6484_snippet_1.fastq \ #first read file of pair
        FASTQ2=6484_snippet_2.fastq \ #second read file of pair
        OUTPUT=6484_snippet_fastqtosam.bam \
        READ_GROUP_NAME=H0164.2 \ #required; changed from default of A
        SAMPLE_NAME=NA12878 \ #required
        LIBRARY_NAME=Solexa-272222 \ #required 
        PLATFORM_UNIT=H0164ALXX140820.2 \ 
        PLATFORM=illumina \ #recommended
        SEQUENCING_CENTER=BI \ 
        RUN_DATE=2014-08-20T00:00:00-0400


    (B) Convert aligned BAM to uBAM and discard problematic records using RevertSam

    We use Picard's RevertSam to remove alignment information and generate an unmapped BAM (uBAM).

    java -Xmx8G -jar /path/picard.jar RevertSam \
        I=6484_snippet.bam \
        O=6484_snippet_revertsam.bam \
        SANITIZE=true \ 
        MAX_DISCARD_FRACTION=0.005 \ #informational; does not affect processing
        ATTRIBUTE_TO_CLEAR=XT \
        ATTRIBUTE_TO_CLEAR=XN \
        ATTRIBUTE_TO_CLEAR=AS \ #Picard release of 9/2015 clears AS by default
        ATTRIBUTE_TO_CLEAR=OC \
        ATTRIBUTE_TO_CLEAR=OP \
        SORT_ORDER=queryname \ #default
        RESTORE_ORIGINAL_QUALITIES=true \ #default
        REMOVE_DUPLICATE_INFORMATION=true \ #default
        REMOVE_ALIGNMENT_INFORMATION=true #default

    To process large files, also designate a temporary directory.

        TMP_DIR=/path/shlee #sets environmental variable for temporary directory
    ...
    0
    Comment actions Permalink
  • Avatar
    Kountay Dwivedi

    Kindly let us know of a tool that converts FASTQ to uBAM for that is needed in Data Preprocessing phase of GATK Best Practices.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk