Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

FilterAlignmentArtifacts error

0

4 comments

  • Avatar
    David Benjamin

    Nicola Dick That looks like a bug in a native C library that we depend on.  The only quick solution is also a dirty one — just exclude the variant where the output VCF stops, re-run, and paste the variant back into the output.  That probably already occurred to you, but if it means anything I am granting my blessing for this hack.  

    Since it's native code I would not be shocked if it's hardware-specific.  In fact, I know it is -- this is Intel code, and it is optimized to run on their processors.  Are TA and TB running on different machines, or on a heterogeneous cluster?  Just switching computers might work.

    By the way, you should use a BWA mem index image generated from the hg38 reference, regardless of the reference to which the original bam is aligned.  The idea is to realign reads to the best possible reference.  If you're lucky, this change alone will randomly avoid the error.

    0
    Comment actions Permalink
  • Avatar
    Nicola Dick

    Hello David Benjamin and thank you for your answer.

    This is interesting, leaving out the line didn't work. However I tried using the hg38 reference (
    http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/) but now contigs of reads and reference don't overlap. Do you have a specific suggestion which hg38 reference I should use? I am not sure if I got the correct one, because the error is not only in naming [chr1,chr2,..] into [1,2,..] but the ref looks like this, with a lot of weird names in between:

     

    reference contigs = [chr1, chr10, chr11, chr11_KI270721v1_random, chr12, chr13, chr14, chr14_GL000009v2_random, chr14_GL000225v1_random, chr14_KI270722v1_random, chr14_GL000194v1_random, chr14_KI270723v1_random, chr14_KI270724v1_random, chr14_KI270725v1_random, chr14_KI270726v1_random, chr15, chr15_KI270727v1_random, chr16, chr16_KI270728v1_random, chr17, chr17_GL000205v2_random, chr17_KI270729v1_random, chr17_KI270730v1_random, chr18, chr19, chr1_KI270706v1_random, chr1_KI270707v1_random, chr1_KI270708v1_random, chr1_KI270709v1_random, chr1_KI270710v1_random, chr1_KI270711v1_random, chr1_KI270712v1_random, chr1_KI270713v1_random, chr1_KI270714v1_random, chr2, chr20, chr21, chr22, chr22_KI270731v1_random, chr22_KI270732v1_random, chr22_KI270733v1_random, chr22_KI270734v1_random, chr22_KI270735v1_random, chr22_KI270736v1_random, chr22_KI270737v1_random, chr22_KI270738v1_random, chr22_KI270739v1_random, chr2_KI270715v1_random, chr2_KI270716v1_random, chr3, chr3_GL000221v1_random, chr4, chr4_GL000008v2_random, chr5, chr5_GL000208v1_random, chr6, chr7, chr8, chr9, chr9_KI270717v1_random, chr9_KI270718v1_random, chr9_KI270719v1_random, chr9_KI270720v1_random, chr1_KI270762v1_alt, chr1_KI270766v1_alt, chr1_KI270760v1_alt, chr1_KI270765v1_alt, chr1_GL383518v1_alt, chr1_GL383519v1_alt, chr1_GL383520v2_alt, chr1_KI270764v1_alt, chr1_KI270763v1_alt, chr1_KI270759v1_alt, chr1_KI270761v1_alt, chr2_KI270770v1_alt, chr2_KI270773v1_alt, chr2_KI270774v1_alt, chr2_KI270769v1_alt, chr2_GL383521v1_alt, chr2_KI270772v1_alt, chr2_KI270775v1_alt, chr2_KI270771v1_alt, chr2_KI270768v1_alt, chr2_GL582966v2_alt, chr2_GL383522v1_alt, chr2_KI270776v1_alt, chr2_KI270767v1_alt, chr3_JH636055v2_alt, chr3_KI270783v1_alt, chr3_KI270780v1_alt, chr3_GL383526v1_alt, chr3_KI270777v1_alt, chr3_KI270778v1_alt, chr3_KI270781v1_alt, chr3_KI270779v1_alt, chr3_KI270782v1_alt, chr3_KI270784v1_alt, chr4_KI270790v1_alt, chr4_GL383528v1_alt, chr4_KI270787v1_alt, chr4_GL000257v2_alt, chr4_KI270788v1_alt, chr4_GL383527v1_alt, chr4_KI270785v1_alt, chr4_KI270789v1_alt, chr4_KI270786v1_alt, chr5_KI270793v1_alt, chr5_KI270792v1_alt, chr5_KI270791v1_alt, chr5_GL383532v1_alt, chr5_GL949742v1_alt, chr5_KI270794v1_alt, chr5_GL339449v2_alt, chr5_GL383530v1_alt, chr5_KI270796v1_alt, chr5_GL383531v1_alt, chr5_KI270795v1_alt, chr6_GL000250v2_alt, chr6_KI270800v1_alt, chr6_KI270799v1_alt, chr6_GL383533v1_alt, chr6_KI270801v1_alt, chr6_KI270802v1_alt, chr6_KB021644v2_alt, chr6_KI270797v1_alt, chr6_KI270798v1_alt, chr7_KI270804v1_alt, chr7_KI270809v1_alt, chr7_KI270806v1_alt, chr7_GL383534v2_alt, chr7_KI270803v1_alt, chr7_KI270808v1_alt, chr7_KI270807v1_alt, chr7_KI270805v1_alt,.....]

     

    Many Greetings,

    Nicola

    0
    Comment actions Permalink
  • Avatar
    David Benjamin

    What you are seeing are alternate haplotype contigs that are a standard part of hg38.  As long as your reads are aligned to a subset of the reference contigs there will be no error.  That is, as long as your reads and reference have the same convention — both "1" or both "chr1" etc — there should be no problem even if the reference has contigs not present in your BAM.

    Just to be clear because it really is a bit messy, if your reads are aligned to hg19, you want to use the same hg19 fasta for the -R argument, but you should use any hg38 BWA mem index image for the -bwa-mem-index-image argument.  The index image's contigs do not have to match anything else.

    0
    Comment actions Permalink
  • Avatar
    woodword

    Try to downgrade GATK to 4.1.3.0

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk