Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Force genotype with specific reference-allele combinations

0

8 comments

  • Avatar
    Gökalp Çelik

    Hi Joshua Motelow

    Can you try adding the parameter 

    --output-mode EMIT_ALL_ACTIVE_SITES

    to the HaplotypeCaller command? It should be able to emit INDELs properly. If you also wish to capture MNPs found within the pharmcat positions file you should set 

    --max-mnp-distance 1

    Keep in mind that this parameter may convert some of those SNPs to MNPs as well if they are in close proximity to another SNP so you need to do some post processing to eliminate unwanted MNPs from that data. 

    Finally. GVCF mode will generate additional need post processing so instead of using GVCF mode we recommend running HaplotypeCaller without emitting reference confidence blocks so you can omit -ERC parameter. 

    I hope this helps. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Joshua Motelow

    Thank you so much for the suggestion but it unfortunately did not work! This command excluded almost all of the non-variant positions (I'm not sure why it included some of the non-variant positions) and it excluded most of the indels. Thank you so much for your help!

     

    GATK 4.6

     

    Command

     

        $gatk_path --java-options "-Xmx4G" HaplotypeCaller -R $reference_grch38 \

        -L $pharmcat_git/pharmcat_positions.vcf \

        -I $bam_path \

        -O $vcf_single_sample \

        -ip 25 \

        --output-mode EMIT_ALL_ACTIVE_SITES

     

    Chromosome 2 positions from $pharmcat_git/pharmcat_positions.vcf

    chr2        233759924        rs887829        C        T        .        PASS        PX=UGT1A1        GT        0/0

    chr2        233760233        rs3064744        CAT        C,CATAT,CATATAT        .        PASS        PX=UGT1A1        GT        0/0

    chr2        233760498        rs4148323        G        A        .        PASS        PX=UGT1A1        GT        0/0

    chr2        233760973        rs35350960        C        A        .        PASS        PX=UGT1A1        GT        0/0

     

    Expected/desired chromosome 2 output for my sample

    chr2        233759924        rs887829        C        T        .        PASS        PX=UGT1A1        GT        1/1

    chr2        233760233        .        C        CAT        1109.03        .        AC=2;AF=1.00;AN=2;BaseQRankSum=-0.866;DP=36;ExcessHet=0.0000;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQRankSum=0.000;QD=34.66;ReadPosRankSum=1.355;SOR=0.569        GT:AD:DP:GQ:PL        1/1:1,31:32:76:1123,76,0

    chr2        233760233        rs3064744        CAT        C        .        PASS        PX=UGT1A1        GT        0/0

    chr2        233760233        rs3064744        CAT        CATAT        .        PASS        PX=UGT1A1        GT        0/0

    chr2        233760233        rs3064744        CAT        CATATAT        .        PASS        PX=UGT1A1        GT        0/0

    chr2        233760498        rs4148323        G        A        .        PASS        PX=UGT1A1        GT        0/0

    chr2        233760973        rs35350960        C        A        .        PASS        PX=UGT1A1        GT        0/0

     

     

    output from $vcf_single_sample for chromosome 2

    chr2        233759924        .        C        T        152.96        .        AC=2;AF=1.00;AN=2;DP=5;ExcessHet=0.0000;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=30.59;SOR=1.981        GT:AD:DP:GQ:PL        1/1:0,5:5:15:167,15,0

    chr2        233760233        .        C        CAT        1109.03        .        AC=2;AF=1.00;AN=2;BaseQRankSum=-0.866;DP=36;ExcessHet=0.0000;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQRankSum=0.000;QD=34.66;ReadPosRankSum=1.355;SOR=0.569        GT:AD:DP:GQ:PL        1/1:1,31:32:76:1123,76,0

    0
    Comment actions Permalink
  • Avatar
    Joshua Motelow

    To add a few additional options I tried which gave similar output.

     

        $gatk_path --java-options "-Xmx4G" HaplotypeCaller -R $reference_grch38 \

        -L $pharmcat_git/pharmcat_positions.vcf \

        -I $bam_path \

        -O $vcf_single_sample \

        -ip 25 \

        --output-mode EMIT_ALL_CONFIDENT_SITES

     

    To add a few additional options I tried

        $gatk_path --java-options "-Xmx4G" HaplotypeCaller -R $reference_grch38 \

        -L $pharmcat_git/pharmcat_positions.vcf \

        -I $bam_path \

        -O $vcf_single_sample \

        -ip 25 \

        --all-site-pls true \

        --output-mode EMIT_ALL_CONFIDENT_SITES

     

    $gatk_path --java-options "-Xmx4G" HaplotypeCaller -R $reference_grch38 \

        -L $pharmcat_git/pharmcat_positions.vcf \

        -I $bam_path \

        -O $vcf_single_sample \

        -ip 25 \

        --output-mode EMIT_ALL_ACTIVE_SITES

     

    $gatk_path --java-options "-Xmx4G" HaplotypeCaller -R $reference_grch38 \

        -L $pharmcat_git/pharmcat_positions.vcf \

        -I $bam_path \

        -O $vcf_single_sample \

        -ip 25 \

        --all-site-pls true \

        --output-mode EMIT_ALL_ACTIVE_SITES

     

    I should also add that the CRAM is an exome and I would happily accept ./. as the genotype if its missing. I have updated the expected/desired below

    Chromosome 2 positions from $pharmcat_git/pharmcat_positions.vcf

    chr2        233759924        rs887829        C        T        .        PASS        PX=UGT1A1        GT        0/0

    chr2        233760233        rs3064744        CAT        C,CATAT,CATATAT        .        PASS        PX=UGT1A1        GT        0/0

    chr2        233760498        rs4148323        G        A        .        PASS        PX=UGT1A1        GT        0/0

    chr2        233760973        rs35350960        C        A        .        PASS        PX=UGT1A1        GT        0/0

     

    Expected/desired chromosome 2 output for my sample

    chr2        233759924        rs887829        C        T        .        PASS        PX=UGT1A1        GT        1/1

    chr2        233760233        .        C        CAT        1109.03        .        AC=2;AF=1.00;AN=2;BaseQRankSum=-0.866;DP=36;ExcessHet=0.0000;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQRankSum=0.000;QD=34.66;ReadPosRankSum=1.355;SOR=0.569        GT:AD:DP:GQ:PL        1/1:1,31:32:76:1123,76,0

    chr2        233760233        rs3064744        CAT        C        .        PASS        PX=UGT1A1        GT        0/0

    chr2        233760233        rs3064744        CAT        CATAT        .        PASS        PX=UGT1A1        GT        0/0

    chr2        233760233        rs3064744        CAT        CATATAT        .        PASS        PX=UGT1A1        GT        0/0

    chr2        233760498        rs4148323        G        A        .        PASS        PX=UGT1A1        GT        0/0 or ./.

    chr2        233760973        rs35350960        C        A        .        PASS        PX=UGT1A1        GT        0/0 or ./.

     

     

    output from $vcf_single_sample for chromosome 2

    chr2        233759924        .        C        T        152.96        .        AC=2;AF=1.00;AN=2;DP=5;ExcessHet=0.0000;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=30.59;SOR=1.981        GT:AD:DP:GQ:PL        1/1:0,5:5:15:167,15,0

    chr2        233760233        .        C        CAT        1109.03        .        AC=2;AF=1.00;AN=2;BaseQRankSum=-0.866;DP=36;ExcessHet=0.0000;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQRankSum=0.000;QD=34.66;ReadPosRankSum=1.355;SOR=0.569        GT:AD:DP:GQ:PL        1/1:1,31:32:76:1123,76,0

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi again. 

    Can you make sure that you are providing --alleles parameter to force call the genotype status of each of those sites ?

    You need to provide the pharmcat vcf file as --alleles parameter as well. 

    0
    Comment actions Permalink
  • Avatar
    Joshua Motelow

    The following worked beautifully

    $gatk_path --java-options "-Xmx4G" HaplotypeCaller -R $reference_grch38 \
        --alleles $input_decomposed_vcf \
        -I $bam_path \
        -O $vcf_single_sample \
        -L $pharmcat_git/pharmcat_positions.vcf \
        -ip 25 \
        --output-mode EMIT_ALL_ACTIVE_SITES
    There is a slight issue with MNVs which i will now investigate. Thank you so so much for your feedback.
    0
    Comment actions Permalink
  • Avatar
    Joshua Motelow

    I should add that i used vt decompose for the pharmcat positions. I don't think it was necessary. Thanks.

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Great to hear that the problem is solved. If you wish to add MNPs to your callset then you need to adjust the MNP distance parameter I mentioned above and do a cleanup and merge once you call once with MNPs and once without. Calling with MNP will include additional non-desired ref and alt alleles for certain positions. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Joshua Motelow

    added the mnv option and now everything works! Thank you!!!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk