Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

CombineGVCFs splitting MNPs into individual positions

0

6 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Gabriel Margarido,

    This output seems like it does not have issues in what you have shown us, it is okay to have multiple ref call lines.

    You said:

    When I run GenomicsDBImport these sites cause problems

    Could you share what is going on with GenomicsDBImport so that we can have a better idea of the issue? Please share the stack trace.

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Gabriel Margarido

    Hi Genevieve,

    Thank you for your reply.

     

    This is my GenomicsDBImport command:

    ~/software/gatk-4.1.9.0/gatk \
    --java-options "-Xmx500G" \
    GenomicsDBImport \
    --genomicsdb-workspace-path ./all_samples_db \
    -R myref.fa \
    -V sample1.g.vcf \
    -V sample2.g.vcf \
    -L ./all_contigs.list

    and the trace:

    terminate called after throwing an instance of 'VCF2BinaryException'
    what(): VCF2BinaryException : Mismatch in field length and field length descriptor:
    Length descriptor in vid/VCF header specifies that field "AF" should contain A element(s).
    In file/stream "sample1_stream", at contig "Chr01", position 1930, for sample "sample1", the field AF has 3 elements; expected 2

     

    For positions 1930 through 1933, the AF field is identical to the original values for position 1929, even though the ALT field is different.

     

    Thank you again for your attention.

     

    Best regards,

    Gabriel

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Gabriel,

    Unfortunately we are not able to look into this use case deeply because this is not how we recommend using CombineGVCFs. Combine GVCFs is for merging single sample GVCFs into multi sample GVCFs. 

    Our support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For context, check out our support policy.

    We will add this question to our backlog and get to it if our capacity allows. For now, please see this document on how to find solutions on our forum. We also encourage other users to help out if they know the answer.

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Gabriel Margarido

    Hello Genevieve,

     

    I certainly do understand and appreciate your attention.

    Hopefully, this will be solved in the future.

     

    Gabriel

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Gabriel Margarido,

    I looked this over with my team and we noticed that you are doing two operations that are not supported by GATK, which you may want to consider revising to our Best Practices.

    1. Using both CombineGVCFs and GenomicsDBImport. These both have the same functionality, so you should not be using both in the same pipeline.
    2. MNPs are not supported with GenomicsDBImport and joint genotyping. If you run HaplotypeCaller with MNPs enabled, the GVCFs have to be genotyped individually. There should have been a warning about this when you ran HaplotypeCaller,  * Generated GVCFs that contain MNPs can only be genotyped individually. *

    Sorry we do not have more feedback about this, but hopefully the advice can help you successfully use GATK.

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Gabriel Margarido

    Hi Genevieve,

     

    I forgot to respond with the final approach I took, which may be useful to others reading this.

     

    Ultimately I ran HaplotypeCaller separately for each chromosome and each sample, next concatenating the chromosome-specific results for each sample with GatherVcfs, and finally running GenomicsDBImport. Everything worked as expected.

     

    Because the depth of sequencing I used was very high, it would be useful if CombineGVCFs handled multiple files for a single sample, as this would make it easier to parallelize HaplotypeCaller (GatherVcfs does not accept overlapping positions).

    In any case, I understand this may be out of the scope of the tool.

     

    Thank you very much for your time and help.

     

    Best regards,

    Gabriel

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk