CalculateContamination differs much in single and paired mode
AnsweredIf you are seeing an error, please provide(REQUIRED) :
a)The Genome Analysis Toolkit (GATK) v4.2.4:
result part
# without pair
calculatecontamination.table
sample contamination error
P2106250CASE 0.15474852529665237 0.02436622519645254
# with pair
_paired.calculatecontamination.table
sample contamination error
P2106250CASE 0.26597194898202964 0.030524883440651074
the result differs a lot , is there any reason for this?
thanks a lot
-
Official comment
Hi linouhao,
I have an update from our developers, they were able to take a look at your questions:
I am wondering whether it can serve as a separate tool for contamination
Yes, CalculateContamination can be a stand alone tool without Mutect2.
1 what is the cutoff of contamination
There is no specific cutoff for contamination. It depends on your own data, analysis, and goals
2 I combine two sample fastq, the contamination values is lower than calculate separately
You should not be combining multiple samples for CalculateContamination. Combining samples will break the underlying assumptions of the model and the output will not be reliable. If your fastqs are from the same sample, then the contamination should be consistent if you combine the fastqs.
3 how to interpret the following values, percentage or double, and what it stands for?
sample contamination error
ZZ2 0.0024162826035807497 0.003194224673422314The value is a double. The sample contamination error generally should be taken with a grain of salt because there are a lot of assumptions that can go wrong. If the number is small, you can trust the estimate. If the number is large, something wrong is occurring. The value you shared indicates that the contamination is small.
I hope this helps out!
Comment actions -
Hi linouhao,
I am going to move your post into our Community Discussions -> General Discussion topic, as the Somatic topic is for reporting bugs and issues with GATK.
You can read more about our forum guidelines and the topics here: Forum Guidelines.
Best,
Genevieve
-
Thanks a lo Genevieve Brandt (she/her).
I have two samples sequence in one batch. they are tumor-only sample.
one has a variant;
EGFR:NM_005228.4:exon21:c.2573T>G:p.L858R 3695 2858 77.35%. and I use file
af-only-gnomad.raw.sites.b37.vcf.gz
small_exac_common_3_b37.vcf.gz
to calculate the contamination, result is
sample contamination error
AB1 0.03449856983776939 0.046057571685247295
#####################################
the other sample
EGFR:NM_005228.4:exon21:c.2573T>G:p.L858R 1979 6 0.30%
ample contamination error
AB2 0.023914936098781914 0.025049172427224962
#is there a contamination between the two samples? -
This a new batch, different from the above.
KRAS:NM_033360.3:exon2:c.34G>T:p.G12C
1211(total depth) 7(alt dpth) 0.58% (freq)
sample contamination error
ZZ1 0.09630143157338196 0.09423682033092913does this 2 sample has contamination?
the other is
sample contamination error
ZZ2 0.0024162826035807497 0.003194224673422314KRAS:NM_033360.3:exon2:c.34G>T:p.G12C
1114 155 13.91% -
and the other impiortant question is the contamination is a percentage or just a decimal;
the origin code show double. and find no percentage transition
-
when I merge two different sample fastq(AB1 and AB2), the final result is
sample contamination error
AB1_AB2 0.025162044789905476 0.03591113387573044the value is lower than individual sample contamination, it makes me feel strange
-
Hi linouhao,
To start with your original post, if you are doing tumor only vs matched normal calculations, it makes sense that you will get different values. Matched normal analysis is much more reliable, if you have a matched normal, I would definitely recommend using it for your somatic analysis.
Please let me know if you have further questions.
Best,
Genevieve
-
Thanks a lot.
Most of the time, we can not get the matched normals.
I am eager to know my question answer
-
Can you clarify your other questions?
-
Thanks a lot.
1 what is the cutoff of contamination
2 I combine two sample fastq, the contamination values is lower than calculate separately
-
- There isn't a specific contamination hard filter. The contamination table is used as input to FilterMutectCalls, which uses a model for filtering. You can read more about filtering in the Mutect2 tutorial and paper.
- We do not recommend combining multiple samples for analysis. Mutect2 is intended to be run on one sample or multiple samples from the same individual in multi-sample mode.
-
Thanks a lot
my intention is for check contamination,not for calling. so it matters nothing with mutect2.
I am wondering whether it can serve as a separate tool for contamination, and want to the answer of these question.
1 what is the cutoff of contamination
2 I combine two sample fastq, the contamination values is lower than calculate separately
3 how to interpret the following values, percentage or double, and what it stands for?
sample contamination error
ZZ2 0.0024162826035807497 0.003194224673422314 -
Thanks a lot for you and developers, the answer is helpful.
I am here want to ask a minor question,
"
If the number is small, you can trust the estimate. If the number is large, something wrong is occurring. The value you shared indicates that the contamination is small.
"
although you have said there is no cutoff, you also said the value I shared indicates that the contamination is small. how here you assess the small or big
-
There is no specific cutoff we can give you, like I said, a specific cutoff is relative to your data and your experiment.
You can determine if the values are small or big by comparing the results of different samples to each other.
Please sign in to leave a comment.
14 comments