Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

GenomicsDBImport error: USER ERROR has occurred: Duplicate sample: 1. Sample was found in both file:

0

7 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Yuna Son, you may have a problem with your read groups, and have named different samples as "Sample". This does not work for GATK because different samples need to be named accordingly. Please see this document on read groups: https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups

    2
    Comment actions Permalink
  • Avatar
    Yuna Son

    Hi Genevieve Brandt,

    Thank you for your reply. I am trying to change the read groups and re-run the process. I assume that I did the same sample name in the read groups and this causes the problem as you suggested. I will update if I have more issues. Thank you!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Yuna Son, thank you for the update and glad I could help!

    0
    Comment actions Permalink
  • Avatar
    Yangyxt

    Dear Brandt,

    Can I ask why the GenomicDBImport error is correlated with Read Group info. It is an info stored in BAM files while we only input vcf or g.vcf files to GenomicDBImport.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Yangyxt the read group in the BAM file changes the sample column in the VCF file. Please see this for more info: https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups

    0
    Comment actions Permalink
  • Avatar
    Yangyxt

    Dear Brandt,

    Thank you for your response. I understand RG info determines the sample name in VCF column after FORMAT field. I have ran into a issue where the RG ID is not correct, whereas the RG SM field is the sample name we want. And the VCF sample column has the right sample name( from which I guess it is the SM field in @RG line instead of the ID field determining the VCF sample column label). Under this circumstance, I still receive a Duplicate Name Error when running GenomicsDBImport. 

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Yangyxt could you post your command and stack trace so we can evaluate if you are facing the same issue?

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk