uBAM is a variant form of the BAM file format in which the read data does not contain mapping information. This is basically an "off-label" use of the BAM format (which was specifically designed to contain mapping information) that is used for data management reasons: it allows you to attach metadata to the reads from as early on in the analysis process as possible.
For this to make sense, it helps to take a step back and look at the context.
Most sequencing providers generate FASTQ files with the raw unmapped read sequences, so that is the most common form in which the data is input into the mapping step of the pre-processing pipeline. This is not ideal because among other flaws, much of the metadata associated with sequencing runs cannot be stored in FASTQ files, unlike BAM files which can store more information. See this blog post for an overview of the many problems associated with the FASTQ format.
At the Broad Institute, we generate unmapped BAM (uBAM) files directly from the Illumina basecalls in order to keep all metadata in one place, and we do not write the data to FASTQ files at any point. This involves a slightly more complex workflow than is shown in the general Best Practices diagram. See this presentation for more details of how this works.
In case you're wondering, we still show the FASTQ-based workflow as the default in most of our documentation because it is by far the most commonly-used workflow, and we want to keep the documentation accessible for our more novice users.
3 comments
Hi there,
I think it would be useful to include information on this page on how to actually make a uBAM file. At least that's what I was looking for when I ended up here. I eventually found the information I needed to an older article of yours (pasted below), though I don't know if it's outdated. It would be good to have an updated version of the below article, or include a link to that article here if it's still relevant.
https://gatkforums.broadinstitute.org/gatk/discussion/6484/how-to-generate-an-unmapped-bam-from-fastq-or-aligned-bam
Cheers!
Hi,
I'm also looking for documentation on generating a ubam from a fastq. I was excited when I saw the previous post... but apparently this link no longer works. The official tutorial has a self-referential link where it's supposed to point the documentation, so I can't seem to find it. Any chance a link to that information can be posted?
Previous link on legacy website:
https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/tutorials/6484-how-to-generate-an-unmapped-bam-from-fastq-or-aligned-bam
Please sign in to leave a comment.