Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Best practice with read groups

0

1 comment

  • Avatar
    Laura Gauthier

    Hi Sheryl,

    MarkDuplicates specifically may not use the read group information, but BQSR definitely does.  We recommend creating a different FASTQ file for each read group and then converting those to an unaligned BAM that will retain the readgroup information.  You start from paired or interleaved FASTQs -- i.e. paired FASTQs have separate files for each read in the pair and interleaved have both reads in the pair one after another in the same file. If you already have interleaved FASTQs you'll have to split them (see https://github.com/gatk-workflows/seq-format-conversion/blob/master/interleaved-fastq-to-paired-fastq.wdl) You'll also need to split the fastqs by flowcell-lane because the workflow for this purpose expects one read group per fastq. That should be doable with a relatively easy (but probably long-running) Python script. The read names have the flowcell and lane in them.

    You'll need a TSV with the following :

    readgroup
    fastq_pair1
    fastq_pair2
    sample_name
    library_name
    platform_unit
    run_date
    platform_name
    sequencing_center
    This example is for paired FASTQ files, but 
    Platform name is the technology used to produce the reads (i.e. illumina)
    Platform Unit should be unique to each read group, i.e. flowcell.lane.barcode)

    Once you have that TSV you can run the commands as in https://github.com/gatk-workflows/seq-format-conversion/blob/master/paired-fastq-to-unmapped-bam.wdl to create an unmapped BAM that will have all the information for BQSR.

    -Laura

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk