Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

gCNV - Discrepancy in Results between Hg19 and Hg38 Cohort Models

0

3 comments

  • Avatar
    Gökalp Çelik

    Hi Joshua Ravi

    Is the metadata that you mention a known CNVs of those samples or is it something that was also created by other callers?

    One thing that is for sure important is the usage of the exact reference genome for your comparisons with known results. DRAGEN uses a custom masked hg38 reference for its mapping and secondary analysis whereas if you are using the default hg38 reference genome with alt contigs and HLA without additional masking or alt aware mapping it may be possible that you may get different calls for CNVs. 

    Can you make sure that whatever genome you are using for hg38 is the same reference genome that DRAGEN uses?

    One issue that may also plague your samples for calls would be the quality of captures generated by different labs. Each capture is unique in its ways therefore sometimes mixing samples from multiple different labs or runs may end up with results in unexpected ways. This has been my personal experience as well and to overcome this issue the best method is to collect as many samples as you can and also check AT and GC dropout rates of samples using CollectHsMetrics tool. As AT and GC dropout rates differ samples will start showing unexpected CNV behavior. For this reason I highly recommend you check these parameters as well. 

    I hope these help. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Joshua Ravi

    Thanks Gökalp Çelik

    The metadata was generated by another CNV caller, and we utilized the same reference genome. For the Hg19 cohort model, all samples originated from a single lab and a single run. In the case of the Hg38 cohort model, the samples were sourced from a single lab but not from a single run. It's important to note that the samples were not mixed, and distinct models were constructed for each genome build.

    I am seeking guidance on the interpretation of the CollectHsMetrics output.txt file. Any insights or tips on how to effectively decipher this file would be greatly appreciated.

    Thanks in advance for your assistance!

    Best regards,

    Joshua

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi  Joshua Ravi

    CollectHsMetrics file is a tab seperated file in principle therefore you may open it with any spreadsheet editor and get the columns and values for each column clearly. Explanation of each column can be found down in the link

    https://broadinstitute.github.io/picard/picard-metric-definitions.html#HsMetrics 

    It is expected that not all CNV callers will result in absolutely similar results however if you have a highly divergent result for only hg38 dataset that means some parameters don't match with your own analysis vs the metadata generating center's analysis. We may need more details of how metadata is generated and whether you performed a secondary analysis for mapping and alignment for yourself other than the metadata generating center. Also we may ask you to provide some examples of how the metadata differs from your own calls using gCNV.

    Regards. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk