Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

FastaAlternateReferenceMaker creating chimera

0

1 comment

  • Avatar
    Louis Bergelson

    Hi Puechmaille,

    I'm sorry to say that the tool just doesn't do what you're hoping it does.  It sounds like what you want is to output only the reference sequence that is supported directly by your sequencing data, and substitute N everywhere else.  The tool doesn't do that.  It just substitutes the differences into existing reference and leaves anything unspecified as is.  

    It would be possible if you have a g.vcf which includes information about the reference confidence at non-variant sites.  I don't think a regular vcf has enough information to decide where you would want Ns.  If this is a feature you would like supported you could open a feature request ticket on the gatk github.  We're a bit stretched thin right now so I can't promise it will happen, but it's probably not a huge lift to implement so maybe someone would have a chance to do so.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk