Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK4 SCNA Pipeline Double-Counting Issue for Overlapping Reads


1 comment

  • Avatar
    Genevieve Brandt (she/her)

    Hi Xavi Loinaz,

    This sort of double counting is intended and not bug. We have seen that it does not have an appreciable effect downstream in somatic ModelSegments and the gCNV tools in the way you are worried about. However, depending on your use case, you might want to take a closer look at these effects.

    Depending on the problem you are trying to solve, you could get around this issue by filtering out the short fragments with FragmentLengthReadFilter or FirstOfPairReadFilter when running CollectReadCounts. You could also adjust your bin size to be smaller than the fragment length.

    Hope this helps, please let me know if you have any further questions!


    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk