Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Merge Bam + Mark Duplicates breaks alignment

0

3 comments

  • Avatar
    Gökalp Çelik

    Hi Kade Muffett

    MergeBamAlignment tool restores hardclipped reads to softclipped status therefore it is normal to observe too many softclipped reads in IGV view when using this tool. Is this what you observe? Can you also elaborate on how you calculate the error rate?

    Also our recommendation for this tool is to use it during or right after initial mapping stage. MarkDuplicates and any other steps should be used after merging the tagged unmapped bam with the aligned sam file. 

    MergeBamAlignment has the below major benefits

    1- All reads stay as they are in the input fastq files and unmapped bases stay as softclipped. Produced bam file may be used to recreate fastq files close to their original form if needed. 

    2- Certain non-deterministic mappers (BWA) may mark any of the equally scoring pairs as primary whereas by changing this scoring strategy during MergeBamAlignment step you can determine which pair should be primary and which should go as supplementary. 

    3- Contaminating reads from microorganisms or pathogens can be cleaned up based on metrics already built in the tool therefore those reads will not clutter the original organisms reads.

    4- Certain fastq tags can be transferred to aligned reads such as UMI or Adapter positions using this step.

    5- Others that I cannot immediately think...

    If you don't have any interest in any of these major points you may skip this step and continue as you please. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Kade Muffett

    The result does not appear to be restoration-related--the sequences no longer have any relation to the reference. The general error rate reported by qualimap is .73 and similarly the median mismatches reported in the picard tools quality check jumps to 107 per 150 bp read. Reads appear to have jumped places and retain their reported mapping quality score even though they have only accidental overlap with reference.

    CATEGORY TOTAL_READS PF_READS PCT_PF_READS PF_NOISE_READS PF_READS_ALIGNED PCT_PF_READS_ALIGNED PF_ALIGNED_BASES PF_HQ_ALIGNED_READS PF_HQ_ALIGNED_BASES PF_HQ_ALIGNED_Q20_BASES PF_HQ_MEDIAN_MISMATCHES PF_MISMATCH_RATE PF_HQ_ERROR_RATE PF_INDEL_RATE MEAN_READ_LENGTH SD_READ_LENGTH MEDIAN_READ_LENGTH MAD_READ_LENGTH MIN_READ_LENGTH MAX_READ_LENGTH READS_ALIGNED_IN_PAIRS PCT_READS_ALIGNED_IN_PAIRS PF_READS_IMPROPER_PAIRS PCT_PF_READS_IMPROPER_PAIRS BAD_CYCLES STRAND_BALANCE PCT_CHIMERAS PCT_ADAPTER PCT_SOFTCLIP PCT_HARDCLIP AVG_POS_3PRIME_SOFTCLIP_LENGTH
    FIRST_OF_PAIR 25251356 25251356 1 0 16474979 0.652439 2.46E+09 12191873 1.83E+09 1.79E+09 107 0.734622 0.73429 0.000864 150 0 150 0 150 150 15457979 0.93827 1259639 0.076458 0 0.499888 0.005304 0.000011 0.001108 0 9.030712
    SECOND_OF_PAIR 25251356 25251356 1 0 16347984 0.64741 2.44E+09 12121342 1.82E+09 1.78E+09 107 0.734597 0.734272 0.000872 150 0 150 0 150 150 15457979 0.945559 1132644 0.069283 0 0.500307 0.005332 0.000095 0.001107 0 9.02534
    PAIR 50502712 50502712 1 0 32822963 0.649925 4.91E+09 24313215 3.64E+09 3.57E+09 107 0.734609 0.734281 0.000868 150 0 150 0 150 150 30915958 0.9419 2392283 0.072884 0 0.500096 0.005318 0.000053 0.001108 0 9.028026
    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Kade Muffett

    I see that your command line uses the parameter below. 

    --PRIMARY_ALIGNMENT_STRATEGY BestMapq 

    Can you try with a different parameter such as 

    --PRIMARY_ALIGNMENT_STRATEGY MostDistant

    which is also in our best practices. 

    Regards. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk