Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GATK GenotypeGVCFs miss a real Insertion

Answered
0

6 comments

  • Avatar
    touyupang

    the result in g.vcf and vcf:

    g.vcf :

    REF     314     .       CAGG    C,CAAGG,CTAAGG,CGAGG,AAGG,CGG,CAAAGG,<NON_REF>  .       .       BaseQRankSum=-2.174e+00;DP=14411;MQRankSum=-1.870e-01;RAW_MQandDP=51863089,14411;ReadPosRankSum=8.67    GT:AD:DP:GQ:PL:SB       .:64,1239,1909,0,0,4,6,4,0:3226:99:58287,35323,0,94576,94576,84581,90224,101425,94576:7,57,969,2191     .:64,1377,2214,0,0,8,0,0,0:3663:99:76512,36869,0,97343,97343,92485,97343,97343,97343:2,62,975,2615      .:512,130,2312,0,7,11,0,0,0:2972:99:58319,101206,0,97144,92432,96693,97144,97144,97144:153,358,790,1669 .:653,260,2178,6,10,11,0,0,0:3118:99:50235,91713,0,108726,89922,94763,107145,107145,107145:248,405,935,1526

    vcf:

    REF     314     .       C       CA      243337.09       .       AC=4;AF=1.00;AN=4;BaseQRankSum=-2.174e+00;DP=14411;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=59.99;MQRankSum=-1.870e-01;QD=24.56;ReadPosRankSum=8.67;SOR=0.70
    8     GT:AD:DP:GQ:PL  1:64,1909:3226:99:58287,0       1:64,2214:3663:99:76512,0       1:512,2312:2972:99:58319,0      1:653,2178:3118:99:50235,0

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi touyupang,

    Thank you for writing into the forum so we can help to figure out what you are seeing here! It is a complex site. It looks to me that in the GVCF, the genotypes being heavily considered are CAGG REF -> C, CAAGG, or CTAAGG. In the final VCF, the INDEL chosen is a C -> CA at 314. 

    What makes a big difference here is what surrounds this site and what happened in the final VCF there. Could you share multiple sites around 314 for the GVCF and VCF?

    I also have some recommended materials that describe how HaplotypeCaller and GenotypeGVCFs decide on the final call to report:

    Let me know what you find.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    touyupang
    gvcf:
    REF     305     .       A       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  .:3919,2:3921:99:0,1800 .:4229,0:4229:99:0,1800 .:3469,5:3474:99:0,1800 .:3619,31:3650:99:0,1800
    REF     306     .       C       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  .:3881,5:3886:99:0,1800 .:4199,2:4201:99:0,1800 .:3429,4:3433:99:0,1800 .:3592,31:3623:99:0,1800
    REF     307     .       G       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  .:3831,3:3834:99:0,1800 .:4150,3:4153:99:0,1800 .:3382,9:3391:99:0,1800 .:3554,33:3587:99:0,1800
    REF     308     .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  .:3774,12:3786:99:0,1800        .:4106,4:4110:99:0,1800 .:3340,12:3352:99:0,1800        .:3510,35:3545:99:0,1800
    REF     309     .       C       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  .:3740,16:3756:99:0,1800        .:4080,6:4086:99:0,1800 .:3327,7:3334:99:0,1800 .:3473,44:3517:99:0,1800
    REF     310     .       G       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  .:3706,9:3715:99:0,1800 .:4035,10:4045:99:0,1800        .:3280,16:3296:99:0,1800        .:3449,36:3485:99:0,1800
    REF     311     .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  .:3539,94:3633:99:0,1800        .:3874,100:3974:99:0,1800       .:3181,54:3235:99:0,1800        .:3261,152:3413:99:0,1800
    REF     312     .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  .:3611,7:3618:99:0,1800 .:3933,9:3942:99:0,1800 .:3212,21:3233:99:0,1800        .:3350,56:3406:99:0,1800
    REF     313     .       G       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  .:3518,25:3543:99:0,1800        .:3851,27:3878:99:0,1800        .:3122,37:3159:99:0,1800        .:3284,61:3345:99:0,1800
    REF     314     .       CAGG    C,CAAGG,CTAAGG,CGAGG,AAGG,CGG,CAAAGG,<NON_REF>  .       .       BaseQRankSum=-2.174e+00;DP=14411;MQRankSum=-1.870e-01;RAW_MQandDP=51863089,14411;ReadPosRankSum=8.67    GT:AD:DP:GQ:PL:SB       .:64,1239,1909,0,0,4,6,4,0:3226:99:58287,35323,0,94576,94576,84581,90224,101425,94576:7,57,969,2191     .:64,1377,2214,0,0,8,0,0,0:3663:99:76512,36869,0,97343,97343,92485,97343,97343,97343:2,62,975,2615      .:512,130,2312,0,7,11,0,0,0:2972:99:58319,101206,0,97144,92432,96693,97144,97144,97144:153,358,790,1669 .:653,260,2178,6,10,11,0,0,0:3118:99:50235,91713,0,108726,89922,94763,107145,107145,107145:248,405,935,1526
    REF     315     .       A       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  .:2182,1274:3456:99:0,1800      .:2383,1406:3789:99:0,1800      .:2899,171:3070:99:0,1800       .:2945,317:3262:99:0,1800
    REF     316     .       G       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  .:2097,1268:3365:99:0,1800      .:2309,1396:3705:99:0,1800      .:2839,168:3007:99:0,1800       .:2871,312:3183:99:0,1800
    REF     317     .       G       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  .:2059,1273:3332:99:0,1800      .:2232,1407:3639:99:0,1800      .:2807,162:2969:99:0,1800       .:2833,314:3147:99:0,1800
    REF     318     .       A       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  .:3277,17:3294:99:0,1800        .:3585,16:3601:99:0,1800        .:2909,33:2942:99:0,1800        .:3068,55:3123:99:0,1800
    REF     319     .       G       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  .:3229,25:3254:99:0,1800        .:3565,8:3573:99:0,1800 .:2887,29:2916:99:0,1800        .:3033,54:3087:99:0,1800
    REF     320     .       G       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  .:3216,14:3230:99:0,1800        .:3546,2:3548:99:0,1800 .:2843,30:2873:99:0,1800        .:3001,57:3058:99:0,1800
    REF     321     .       G       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  .:3160,29:3189:99:0,1800        .:3506,13:3519:99:0,1800        .:2800,44:2844:99:0,1800        .:2974,58:3032:99:0,1800
    REF     322     .       A       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  .:3151,9:3160:99:0,1800 .:3462,12:3474:99:0,1800        .:2765,34:2799:99:0,1800        .:2956,49:3005:99:0,1800
    REF     323     .       T       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  .:3071,1:3072:99:0,1800 .:3420,5:3425:99:0,1800 .:2730,33:2763:99:0,1800        .:2959,38:2997:99:0,1800
    REF     324     .       C       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  .:3031,6:3037:99:0,1800 .:3367,10:3377:99:0,1800        .:2672,34:2706:99:0,1800        .:2936,27:2963:99:0,1800
    REF     325     .       A       <NON_REF>       .       .       .       GT:AD:DP:GQ:PL  .:3020,3:3023:99:0,1800 .:3355,5:3360:99:0,1800 .:2663,30:2693:99:0,1800        .:2929,22:2951:99:0,1800

    vcf only multiple sites.

     
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    touyupang did you mean to include any sites for the VCF? I'm not sure what you meant.

    0
    Comment actions Permalink
  • Avatar
    touyupang

    I mean vcf only have mutations

    ##source=SelectVariants
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  CrSTM-CRI-Fruit1_FDSW210283057-1r       CrSTM-CRI-Fruit2_FDSW210283058-1r       CrSTM-CRI-Fruit3_FDSW210283059-1r       CrSTM-CRI-Fruit4_FDSW210283060-1r
    REF     314     .       C       CA      243337.09       PASS    AC=4;AF=1.00;AN=4;BaseQRankSum=-2.174e+00;DP=14411;FS=0.000;MLEAC=4;MLEAF=1.00;MQ=59.99;MQRankSum=-1.870e-01;QD=24.56;ReadPosRankSum=8.67;SOR=0.708     GT:AD:DP:GQ:PL  1:64,1909:3226:99:58287,0       1:64,2214:3663:99:76512,0       1:512,2312:2972:99:58319,0      1:653,2178:3118:99:50235,0
    REF     422     .       A       AATT    162.31  PASS    AC=1;AF=1.00;AN=1;DP=283;FS=0.000;MLEAC=1;MLEAF=1.00;MQ=59.90;QD=25.36;SOR=3.258        GT:AD:DP:GQ:PL  .:26,43:69:.:0,0        .:44,66:110:.:0,0       1:0,4:4:99:171,0        .:15,34:49:.:0,0
    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    touyupang thanks for the overview. Yes, it seems like this is the expected result from GenotypeGVCFs and there are no issues with the final VCF result. The C -> CA alt has the most read support which is why it is chosen. 

    For example, sample 1 has 64,1239,1909,0,0,4,6,4,0. This is 64 for ref, 1239 for a 3bp deletion, and 1909 for C -> CA. 

    If you are looking to see all the ALTs in the final VCF even though they are not called, you could try reducing the --standard-confidence-for-calling parameter, though we wouldn't recommend this option.

    Did you have any other questions about this I didn't cover?

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk