Haplotype Caller General Question
AnsweredDear all,
I tried to perform a multisample variant calling using HaplotypeCaller(see command line below)
To this end I used HaplotypeCaller over different samples and then combine them into a final vcf. From the resulting vcf file I tried to obtain a list of SNPs with CHR and POS, sample names and their genotypes.
Meanwhile I realized that some samples had the followoing genotype:
0|1:69,30:99:99:0|1:139213_A_G:943,0,2730:139213
0|1:116,36:152:99:0|1:139213_A_G:1114,0,4603:139213
I wonder which is the reason of that genotype instead of the expected
0/0:36,0:36:99:.:.:0,103,1114.
I have read something related with the phased/unphased but I don't know how to deal with this.
Any advice will be appreciated. Thank you in advance
Regards,
------------------------------
REQUIRED for all errors and issues:
a) GATK version used: GATK v. 4.1.7
b) Exact command used:
gatk HaplotypeCaller -ERC GVCF -R Homo_sapiens_assembly.fasta -I Sample1_recal_reads.bam -O Sample1.g.vcf -L Exome_V6_hg19.bed
gatk HaplotypeCaller -ERC GVCF -R Homo_sapiens_assembly.fasta -I Samples2_recal_reads.bam -O Sample2.g.vcf -L Exome_V6_hg19.bed
gatk CombineGVCFs \
-R ${refDir}/${assembly}/Homo_sapiens_assembly.fasta \
--variant Sample1.g.vcf.gz \
--variant Sample2.g.vcf.gz \
-O ${outDir}/cohort.g.vcf.gz
-
Hi Jordi D,
This is extra information about the phasing of your variants. We have an article about phasing here: https://gatk.broadinstitute.org/hc/en-us/articles/360050354712-What-is-physical-phasing.
Please let me know if you have any further questions.
Best,
Genevieve
-
Hi Genevieve Brandt (she/her).
Thank you for the answer, I have carefully read the article.
However, now I wonder if those variants with PID and PGT information, can be succesfully filtered with 'Select Variants' and JEXL.
I found a variant (which I used as a control) with the following 4 genotypes from a 4 sample vcf file:
0|1:32,59:91:99:0|1:4511278_C_T:2360,0,1146:4511278 0|1:34,52:86:99:0|1:4511278_C_T:2081,0,1244:4511278 0|1:86,37:123:99:0|1:4511278_C_T:1295,0,3442:4511278 0|1:42,102:144:99:0|1:4511278_C_T:4117,0,1450:4511278
Then I used the following 'SelectVariants' filtering options in order to obtain those variants with at least 1 called variant and at least 1 VAR allele:
-select 'vc.getCalledChrCount() != 0 && (vc.getHetCount() >= 1 or vc.getHomVarCount() >= 1)'
Then, the variant mentioned above was filtered out. I guess that 'SelectVariants' did not work with phased genotypes, didn't?
Any advice will be appreciated
Thank you in advance
-
Jordi D What is the reason you want to filter the phased variants?
-
Hi Genevieve Brandt (she/her),
in fact I want to filter out any variant (phased or unphased) that doesn't meet the parameters. I am dealing with a multi-sample vcf (~80 samples) and I previously used a DEPTH and a GQ filter using VariantFiltration. Then, I would like to get rid of positions that become "no-called" due to those filters (./.:3,0:3:Depth;GQ:9:0,9,110) and also positions without a VAR allele (I guess that this is similar to looking for AC >=1).
I mean, after VariantFiltration I got certain position with the following genotypes:
0/0:33,0:33:PASS:99:.:.:0,99,1176 0/0:20,0:20:PASS:60:.:.:0,60,685 0|1:8,19:27:PASS:99:0|1:139213_A_G:605,0,223:139213 ./.:11,0:11:Depth:27:.:.:0,27,405
One of them was excluded due to Depth, two were HOMOREF and I got one phased variant as a hetereozygote. However, that variant was filtered out.
Hope this helps to understand the purpose!
Thank you.
-
Thanks, Jordi D. SelectVariants should work on phased variants. Can you share the complete variant lines that you are looking to filter out? It would be easier for me to see that way.
-
Hi again Genevieve Brandt (she/her),
First, I used the following code to filter the variants:
gatk VariantFiltration \
-R ${refDir}/${assembly}/Homo_sapiens_assembly.fasta \
-V ${outDir}/prova.vcf \
-O ${outDir}/prova_filtered_1.vcf \
-genotype-filter-name "Depth" -genotype-filter-expression "DP < 20" \
--set-filtered-genotype-to-no-call true \
-genotype-filter-name "GQ" -genotype-filter-expression "GQ < 20"
## Getting control genotypes (at least 1 called && at least 1 VAR allele & BIALLELIC)
gatk SelectVariants \
-R ${refDir}/${assembly}/Homo_sapiens_assembly.fasta \
-V ${outDir}/prova_filtered_1.vcf \
-L ${refDir}/${assembly}/Exome_V6.bed \
-O ${outDir}/prova_filtered_2.vcf \
-select 'vc.getCalledChrCount() != 0 && (vc.getHetCount() >= 1 or vc.getHomVarCount() >= 1)' \
--restrict-alleles-to BIALLELIC \
--exclude-filtered trueInput file prova.vcf was obtained just with example purposes, I mean, I took 6 variants for just 4 samples with several features: no-called, homoref...etc
1 17385 . G A 124.13 PASS BaseQRankSum=-2.016e+00;DP=39;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.125;MQ=37.09;MQRankSum=0.142;QD=7.30;ReadPosRankSum=-2.640e-01;SOR=0.382;VQSLOD=-2.319e+00;culprit=MQ GT:AD:DP:FT:GQ:PL ./.:10,0:10:Depth;GQ:0:0,0,329 ./.:3,0:3:Depth;GQ:9:0,9,110 ./.:9,0:9:Depth;GQ:9:0,9,294 ./.:12,5:17:Depth:99:132,0,359
1 139213 . A G 553.75 VQSRTrancheSNP99.90to100.00 AC=1;AF=0.250;AN=4;BaseQRankSum=-8.660e-01;DP=82;ExcessHet=3.0103;FS=1.603;MLEAC=1;MLEAF=0.125;MQ=28.60;MQRankSum=1.63;QD=22.15;ReadPosRankSum=1.11;SOR=0.412;VQSLOD=-3.058e+00;culprit=MQ GT:AD:DP:FT:GQ:PGT:PID:PL:PS 0/0:28,0:28:PASS:78:.:.:0,78,1170 ./.:16,0:16:Depth:42:.:.:0,42,630 0|1:9,16:25:PASS:99:0|1:139213_A_G:563,0,240:139213 ./.:11,0:11:Depth:33:.:.:0,33,452
1 139233 . C A 595.75 VQSRTrancheSNP99.90to100.00 AC=1;AF=0.167;AN=6;BaseQRankSum=-1.619e+00;DP=93;ExcessHet=3.0103;FS=1.690;MLEAC=1;MLEAF=0.125;MQ=29.16;MQRankSum=1.46;QD=22.06;ReadPosRankSum=-2.230e-01;SOR=0.503;VQSLOD=-2.956e+00;culprit=MQ GT:AD:DP:FT:GQ:PGT:PID:PL:PS 0/0:33,0:33:PASS:99:.:.:0,99,1176 0/0:20,0:20:PASS:60:.:.:0,60,685 0|1:8,19:27:PASS:99:0|1:139213_A_G:605,0,223:139213 ./.:11,0:11:Depth:27:.:.:0,27,405
1 762273 . G A 4927.29 PASS AC=8;AF=1.00;AN=8;BaseQRankSum=2.16;DP=151;ExcessHet=3.0103;FS=2.746;MLEAC=8;MLEAF=1.00;MQ=40.64;MQRankSum=-2.252e+00;QD=25.36;ReadPosRankSum=-4.270e-01;SOR=1.751;VQSLOD=-1.602e+00;culprit=MQ GT:AD:DP:GQ:PL 1/1:2,33:35:55:1089,55,0 1/1:0,29:29:87:1091,87,0 1/1:0,40:40:99:1439,120,0 1/1:0,36:36:99:1316,108,0
1 865628 . G A 1055 PASS AC=2;AF=0.250;AN=8;BaseQRankSum=-1.537e+00;DP=149;ExcessHet=3.6798;FS=0.868;MLEAC=2;MLEAF=0.250;MQ=60.00;MQRankSum=0.00;QD=13.19;ReadPosRankSum=0.135;SOR=0.531;VQSLOD=15.66;culprit=MQRankSum GT:AD:DP:GQ:PL 0/0:31,0:31:90:0,90,930 0/1:21,19:40:99:498,0,635 0/0:36,0:36:99:0,99,1485 0/1:20,20:40:99:569,0,597
1 878314 . G C 1087 PASS AC=2;AF=0.250;AN=8;BaseQRankSum=2.26;DP=141;ExcessHet=3.6798;FS=0.000;MLEAC=2;MLEAF=0.250;MQ=60.00;MQRankSum=0.00;POSITIVE_TRAIN_SITE;QD=15.75;ReadPosRankSum=-1.390e-01;SOR=0.676;VQSLOD=16.66;culprit=MQRankSum
GT:AD:DP:GQ:PL 0/1:18,17:36:99:542,0,467 0/1:15,19:34:99:557,0,418 0/0:33,0:33:99:0,99,1250 0/0:36,0:36:99:0,99,1485Then, once I ran the code above, I got just 3 variants:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 05-045 06-084 18-017 18-018
1 762273 . G A 4927.29 PASS AC=8;AF=1.00;AN=8;BaseQRankSum=2.16;DP=151;ExcessHet=3.0103;FS=2.746;MLEAC=8;MLEAF=1.00;MQ=40.64;MQRankSum=-2.252e+00;QD=25.36;ReadPosRankSum=-4.270e-01;SOR=1.751;VQSLOD=-1.602e+00;culprit=MQ GT:AD:DP:GQ:PL 1/1:2,33:35:55:1089,55,0 1/1:0,29:29:87:1091,87,0 1/1:0,40:40:99:1439,120,0 1/1:0,36:36:99:1316,108,0
1 865628 . G A 1055 PASS AC=2;AF=0.250;AN=8;BaseQRankSum=-1.537e+00;DP=149;ExcessHet=3.6798;FS=0.868;MLEAC=2;MLEAF=0.250;MQ=60.00;MQRankSum=0.00;QD=13.19;ReadPosRankSum=0.135;SOR=0.531;VQSLOD=15.66;culprit=MQRankSum GT:AD:DP:GQ:PL 0/0:31,0:31:90:0,90,930 0/1:21,19:40:99:498,0,635 0/0:36,0:36:99:0,99,1485 0/1:20,20:40:99:569,0,597
1 878314 . G C 1087 PASS AC=2;AF=0.250;AN=8;BaseQRankSum=2.26;DP=141;ExcessHet=3.6798;FS=0.000;MLEAC=2;MLEAF=0.250;MQ=60.00;MQRankSum=0.00;POSITIVE_TRAIN_SITE;QD=15.75;ReadPosRankSum=-1.390e-01;SOR=0.676;VQSLOD=16.66;culprit=MQRankSum
GT:AD:DP:GQ:PL 0/1:18,17:36:99:542,0,467 0/1:15,19:34:99:557,0,418 0/0:33,0:33:99:0,99,1250 0/0:36,0:36:99:0,99,1485Given that the goal of the SelectVariants (see below) was to retain variants with at least 1 call and with at least 1 VAR allele according to:
vc.getCalledChrCount() != 0 && (vc.getHetCount() >= 1 or vc.getHomVarCount() >= 1)'
What can we say about that position?
1 139233 . C A 595.75 VQSRTrancheSNP99.90to100.00 AC=1;AF=0.167;AN=6;BaseQRankSum=-1.619e+00;DP=93;ExcessHet=3.0103;FS=1.690;MLEAC=1;MLEAF=0.125;MQ=29.16;MQRankSum=1.46;QD=22.06;ReadPosRankSum=-2.230e-01;SOR=0.503;VQSLOD=-2.956e+00;culprit=MQ GT:AD:DP:FT:GQ:PGT:PID:PL:PS 0/0:33,0:33:PASS:99:.:.:0,99,1176 0/0:20,0:20:PASS:60:.:.:0,60,685 0|1:8,19:27:PASS:99:0|1:139213_A_G:605,0,223:139213 ./.:11,0:11:Depth:27:.:.:0,27,405
As far as I know , the variant meets the criteria, didn't?
Hope this helps to clarify what I am triying to do.
And thank you in advance for your help.
-
Hi all, Genevieve Brandt (she/her),
just to summarize what I am looking for:
Given the position
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 05-045 06-084 18-017 18-018
1 139213 . A G 553.75 VQSRTrancheSNP99.90to100.00 AC=1;AF=0.250;AN=4;BaseQRankSum=-8.660e-01;DP=82;ExcessHet=3.0103;FS=1.603;MLEAC=1;MLEAF=0.125;MQ=28.60;MQRankSum=1.63;QD=22.15;ReadPosRankSum=1.11;SOR=0.412;VQSLOD=-3.058e+00;culprit=MQ GT:AD:DP:FT:GQ:PGT:PID:PL:PS 0/0:28,0:28:PASS:78:.:.:0,78,1170 ./.:16,0:16:Depth:42:.:.:0,42,630 0|1:9,16:25:PASS:99:0|1:139213_A_G:563,0,240:139213 ./.:11,0:11:Depth:33:.:.:0,33,452The following code:
SelectVariants -select 'vc.getGenotype("18-017").isHet()' --exclude-filtered true
doesn't return the expected genotype
0|1:9,16:25:PASS:99:0|1:139213_A_G:563,0,240:139213
Is this the expected behaviour of the SelectVariants function?
Thanks
-
Finally I got the mistake: since the SelectVariants was ran after a VQSR process, some variants were labeled with a FILTER field:
VQSRTrancheSNP99.90to100.00
Then the parameter "exclude-filtered" should be set to false in order to include all variants.
Thank you and apologise for any inconvenience.
-
Hi Jordi D,
Yes, thank you for following up with these details and the solution you found! Glad you are seeing what you expect now.
Best,
Genevieve
Please sign in to leave a comment.
9 comments