Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

MarkDuplicatesSpark only supports singleton fragments and pairs. We found the following group with >2 primary reads

0

1 comment

  • Avatar
    James Emery

    Hello rohit satyam, it looks like you have a problem with the bamfile input to MarkDuplicateSpark. Specifically it looks like there is a readgroup with >2 primary reads (somewhere on ChrM) you can see the error message here:

      Caused by: org.broadinstitute.hellbender.exceptions.UserException$UnimplementedFeature: MarkDuplicatesSpark only supports singleton fragments and pairs. We found the following group with >2 primary reads: ( 4 number of reads).

    I would recommend running a validation tool like Picard ValidateSamFile on your input sample3_CNVP.sorted.bam to make sure there are no problems with your input data. You should expect to see some errors related to some read groups having multiple mates. 

    What sequencing technology are you using? This kind of error could happen because there was an error in processing your input pre-MarkDuplicates that could result in samflags being invalid for a chimeric readgroup (which should be handled by MarkDuplicates). Alternatively, you could have duplicated readnames in your input bam which is going to cause problems for MarkDuplicates. ValidateSamFile should tell you what the erroneous reads and it should be worth investigating what caused your readnames to be like this. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk