rf_tp_probability output field
Could someone provide a little more information on this field please? In particular, how it is derived and how the values might be interpreted. Just eyeballing my results it seems to tally fairly well with those samples which have a very high ratio of DP to gt_AD reads and I wonder if this is an important metric used within it?
Many thanks for your advice
-
Hi James Melhorn, looks like this is related to gnomAD, not GATK, but I can relay the answer from one of the gnomAD developers:
====================
In a nutshell, it's our variant QC results. From v3 to v4, we might have used slightly different metrics to train the VQSR and RF model, if you're curious about the code, these are for v4:
https://github.com/broadinstitute/gnomad_qc/blob/b87f62a7fba9473b73bee9cea927bf4559fabfb2/gnomad_qc/v4/variant_qc/final_filter.py#L297
https://github.com/broadinstitute/gnomad_qc/blob/main/gnomad_qc/v4/variant_qc/random_forest.py
In the v2 suppplementary, you can find more details on the RF model: https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-020-2308-7/MediaObjects/41586_2020_2308_MOESM1_ESM.pdf
====================
Please sign in to leave a comment.
1 comment