Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Do I need to perform fastqc and adapters trimming before gatk pipeline?

1

4 comments

  • Avatar
    danilovkiri

    Hi!

    You can use FastqQC only as a general tool to assess the quality of obtained fastq data including adapter sequences. The FastQC adapter sequences search is implemented using specific Kmers defined internally (Illumina Universal Adapter, Illumina Small RNA 3' Adapter, Illumina Small RNA 5' Adapter, Nextera Transposase Sequence, SOLID Small RNA Adapter). The FastQC Kmer content module might also find adapter-caused biases but not always. Basically, it is better to check the quality of fastq data prior to doing anything. 

    As for adapter trimming itself, there are multiple options. Trimmomatic, for instance, can use a user-provided file with adapter sequences (if you can retrieve them from your sequencing provider) and is highly customizable. In any case, the ends of each read (3' in particular) often have low sequencing quality (which can be revealed in FastQC per base sequence quality plot). Trimmomatic can trim such ends upon necessity in a user-defined manner. Other trimming tools offer similar functionality. Some of them are claimed to work better with DNA nanoball sequencing.

    Adapter trimming might be considered irrelevant only upon a strong belief in the quality you are provided with, and that belief should be based on prior analysis of several batches of samples over a period of time.

    PS: Picard MarkIlluminaAdapters (if applicable) tries to find adapters heuristically and clips them in a BAM/SAM file (better to use on unmapped BAM/SAM). You can also pass a user-defined list of 3' adapter sequences.

    1
    Comment actions Permalink
  • Avatar
    LG

    @danilovkiri , thanks for the comment. So as a short answer, the raw reads should be checked by fastqc and trimmed for adapters before entering GATK pipeline?

    1
    Comment actions Permalink
  • Avatar
    danilovkiri

    LG Shortly, it is better to do. There is no strict rule and I will not provide you with one. Mind your sequencing provider.

    2
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Hi danilovkiri

     

    Thank you for jumping in and helping the GATK community. We appreciate your contribution!

    1
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk