PostprocessGermlineCNVCalls generate a segment file with missing GT field for the segment file
Hi,
I am running the CNV workflow as recommended from this source:
I am using GATK v4.2.6.1 and I am getting the segment vcf file output as the foolowing:
#CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT | Patient1 |
chr1 | 2228507 | CNV_chr1_2228507_237832907 | G | . | 3076.53 | . | END=237832907 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:673:56:3077:137:129 |
chr2 | 21001470 | CNV_chr2_21001470_223967294 | C | . | 3076.53 | . | END=223967294 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:794:53:3077:56:112 |
chr3 | 4516232 | CNV_chr3_4516232_186744045 | T | . | 3076.53 | . | END=186744045 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:474:65:3077:99:147 |
chr4 | 52023697 | CNV_chr4_52023697_186288874 | T | . | 3076.53 | . | END=186288874 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:215:69:3077:146:100 |
chr5 | 218096 | CNV_chr5_218096_177409787 | G | . | 3076.53 | . | END=177409787 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:196:73:3077:135:113 |
chr6 | 6145359 | CNV_chr6_6145359_160611981 | T | . | 3076.53 | . | END=160611981 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:484:64:3077:103:144 |
chr6 | 160633493 | CNV_chr6_160633493_160635564 | G | <DUP> | 15.84 | . | END=160635564 | GT:CN:NP:QA:QS:QSE:QSS | ./.:5:2:14:16:18:16 |
chr6 | 160640407 | CNV_chr6_160640407_160753321 | C | . | 1562.01 | . | END=160753321 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:23:56:1562:89:56 |
chr7 | 1044286 | CNV_chr7_1044286_151876880 | T | . | 3076.53 | . | END=151876880 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:392:49:3077:94:126 |
chr8 | 11542965 | CNV_chr8_11542965_143215875 | T | . | 3076.53 | . | END=143215875 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:58:64:3077:119:153 |
chr9 | 116540 | CNV_chr9_116540_136687617 | T | . | 3076.53 | . | END=136687617 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:259:59:3077:117:98 |
chr10 | 18140477 | CNV_chr10_18140477_119677542 | T | . | 3076.53 | . | END=119677542 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:222:57:3077:108:125 |
chr11 | 532376 | CNV_chr11_532376_128916991 | G | . | 3076.53 | . | END=128916991 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:304:67:3077:86:108 |
chr12 | 2053303 | CNV_chr12_2053303_124863980 | G | . | 3076.53 | . | END=124863980 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:458:54:3077:92:91 |
chr13 | 36848416 | CNV_chr13_36848416_113149777 | A | . | 2733.06 | . | END=113149777 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:49:56:2733:95:106 |
chr14 | 23381780 | CNV_chr14_23381780_94383497 | T | . | 3076.53 | . | END=94383497 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:245:63:3077:102:118 |
chr15 | 34788279 | CNV_chr15_34788279_99713031 | G | . | 3076.53 | . | END=99713031 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:280:65:3077:104:136 |
chr16 | 854151 | CNV_chr16_854151_86513345 | A | . | 3076.53 | . | END=86513345 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:147:62:3077:135:104 |
chr17 | 1744736 | CNV_chr17_1744736_80119591 | A | . | 3076.53 | . | END=80119591 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:215:58:3077:143:112 |
chr18 | 3067002 | CNV_chr18_3067002_59359504 | T | . | 3076.53 | . | END=59359504 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:226:60:3077:130:132 |
chr19 | 4090338 | CNV_chr19_4090338_55157849 | T | . | 3076.53 | . | END=55157849 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:310:49:3077:84:126 |
chr20 | 6769867 | CNV_chr20_6769867_63498023 | C | . | 3076.53 | . | END=63498023 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:81:71:3077:123:123 |
chr21 | 34370207 | CNV_chr21_34370207_43072453 | T | . | 1040.82 | . | END=43072453 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:17:73:1041:121:100 |
chr22 | 19876845 | CNV_chr22_19876845_50524671 | C | . | 3076.53 | . | END=50524671 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:78:47:3077:99:142 |
chrX | 31121573 | CNV_chrX_31121573_154863492 | T | . | 3076.53 | . | END=154863492 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:357:56:3077:51:134 |
chrX | 154885874 | CNV_chrX_154885874_154886417 | T | <DEL> | 50.99 | . | END=154886417 | GT:CN:NP:QA:QS:QSE:QSS | 0/1:1:1:51:51:51:51 |
chrX | 154895817 | CNV_chrX_154895817_155022812 | A | . | 1549.89 | . | END=155022812 | GT:CN:NP:QA:QS:QSE:QSS | 0/0:2:25:83:1550:112:51 |
First, I noticed that this is NOT the usual VCF output we used to get from older GATK versions where 0 is NORMAL ploidy, 1 is DELETION, and 2 is DUPLICATION.
Second, I noticed that for this results (and other samples) we have (./.) for the GT field for the duplication on chr6 while it is 0/1 for the deletion in chrX. I checked other samples and it is always the duplication that leads to a missing GT field, is that normal?
Finally I checked the INTERVAL VCF file generated by the same tool for the same sample and I didn't find the same problem as you can see below for the two regions mentioned in chr6 and chrX:
chr6 | 160606217 | CNV_chr6_160606217_160606918 | N | <DEL>,<DUP> | . | . | END=160606918 | GT:CN:CNLP:CNQ | 0:2:100,100,0,100,100,100:100 | ||
chr6 | 160611302 | CNV_chr6_160611302_160611981 | N | <DEL>,<DUP> | . | . | END=160611981 | GT:CN:CNLP:CNQ | 0:2:100,100,0,87,100,100:87 | ||
chr6 | 160633493 | CNV_chr6_160633493_160634172 | N | <DEL>,<DUP> | . | . | END=160634172 | GT:CN:CNLP:CNQ | 2:5:99,99,72,65,15,0:15 | ||
chr6 | 160634863 | CNV_chr6_160634863_160635564 | N | <DEL>,<DUP> | . | . | END=160635564 | GT:CN:CNLP:CNQ | 2:5:99,99,72,43,14,0:14 | ||
chr6 | 160640407 | CNV_chr6_160640407_160641108 | N | <DEL>,<DUP> | . | . | END=160641108 | GT:CN:CNLP:CNQ | 0:2:100,58,0,67,60,66:58 | ||
chr6 | 160645954 | CNV_chr6_160645954_160646655 | N | <DEL>,<DUP> | . | . | END=160646655 | GT:CN:CNLP:CNQ | 0:2:100,89,0,90,100,100:89 | ||
chr6 | 160650078 | CNV_chr6_160650078_160650757 | N | <DEL>,<DUP> | . | . | END=160650757 | GT:CN:CNLP:CNQ | 0:2:100,100,0,100,100,100:100 |
chrX | 154862823 | CNV_chrX_154862823_154863492 | N | <DEL>,<DUP> | . | . | END=154863492 | GT:CN:CNLP:CNQ | 0:2:100,100,0,100,100,100:100 | ||
chrX | 154885874 | CNV_chrX_154885874_154886417 | N | <DEL>,<DUP> | . | . | END=154886417 | GT:CN:CNLP:CNQ | 1:1:99,0,51,100,100,100:51 | ||
chrX | 154895817 | CNV_chrX_154895817_154896492 | N | <DEL>,<DUP> | . | . | END=154896492 | GT:CN:CNLP:CNQ | 0:2:100,100,0,100,100,100:100 |
Would anybody explain to me this difference in the vcf format between the interval and segment files please? And is it a change that was expected from the old version to the new one?
Best regards
Nawar
-
Hi NawarDalila,
Thank you for writing to the GATK forum! I hope that we can help you sort this out.
The duplications are no-call on purpose because we can't phase them with the tools that we have. So, we don't know if the duplication is one copy on one haplotype and three copies on the other or two copies on each haplotype, which is the information we need to know if we gave it a genotype. So, we don't give it a genotype. Which we recognize is different from SNPs and Indels, which can make some things more complicated, but this is the correct way to represent it.
Please find more detailed information on using PostprocessGermlineCNVCalls here.
I hope this helps to clear things up! Please do not hesitate to reach out if you have any other questions.
Best,
Anthony -
Hi NawarDalila,
We haven't heard from you in a while so we're going to close out this ticket. If you still require assistance, simply respond to this email and we'll be happy to pick up where we left off!
Kind regards,
Anthony
-
regarding the same topic, I have some doubts about how to apply variant filters to DUP genotypes called as './.'. I have tried the tool VariantFiltration (GATK 4.3.0) using different parameters (isNoCall, isCalled,...) to filter those low quality DUP variants. Even I have tried to use inverted filters or the variantContext to keep only the not called genotypes in order to get de DUP variants but everything I have tried ended up ignoring all the DUP variants. On the other hand, the tool SelectVariant works fine and I am able to keep/filter DUP variants using the same parameters that are not working for VariantFiltration. I have read all the online documentation and I didn't find anything at all. However, maybe I am missing something here. How I should proceed to filter some of the DUP genotypes.
Best,
Jesús M
Please sign in to leave a comment.
3 comments