REQUIRED for all errors and issues:
a) GATK version used: latest 4.3
b) Exact command used: GetPileUpSummary + CaclulateContamiantion
c) Entire program log:no errors
----------------------------------------------------
Issue with ContaminationModel for tumor only data. The algorithm failed to detect contamination > 15%. where using the normal matched it does detect it.
- What is the range of detection for tumor only? We saw different performance between tumor_only to matched on a large list of known contaminated samples. ( CONTAMINATION_INITIAL_GUESSES = Arrays.asList(0.05, 0.1, 0.2) is it because the max level is 20%. indeed changing the contamination guess to 40% leads to a lower model contamination guess per sample (I add printouts to ContaminationModel).
- I fully transfer java code to python code with the same final results, if I'm reading gatk segments output (mafs). One difference I encountered is in the Brent.Optimizer which gatk version output lower mafs than scipy. Could it be that apache has any issues? (find local minima etc.)
David Benjamin - The main issue I believe is on the genotyping selection and not on the optimizer. I've few suggestions that I'd like to discuss with you before opening an issue or implementing.
0 comments