Genomics DB Datastore
REQUIRED for all errors and issues:
a) GATK version used: GATK v4.2.2.0
b) Exact command used: Trying to update the genomics DB datastore i have with additional samples
c) Entire program log: There is no program log as I am not having any issue running the command.
The issue is when I update a genomics DB datastore by incrementally adding more samples the vcfheader.vcf file within the datastore is not being updated correctly with the additional samples it is just storing my original command. How can I access the list of sample names that are currently in the datastore. Is it in the callset.json file ?
-
Hi Arman Seuylemezian,
Yes, the callset.json file contains the list of samples currently in the datastore.
When incrementally adding more samples to an existing GenomicsDB, it's important to keep the intervals the same, otherwise you'll run into problems. Can you please confirm that the intervals for the new samples match the intervals for the old samples?
Regards,David
-
Got it, I was able to confirm that the callset.json file in my instance does accurately contain all the samples that are in the datastore however there is a discrepancy with the vcfheader.vcf file as that is not being updated with the new samples.
I can also confirm that the exact same intervals were used when adding new samples in fact the same exact bed file was supplied.
-
The vcfheader.vcf file is just used internally for bookkeeping purposes by GenomicsDB to store VCF header metadata, and does not contain any sample information.
When you use a tool like GATK's SelectVariants to extract sites/samples from the GenomicsDB to a VCF, you should see all of your samples appear in the final VCF. Is this not the case?
Regards,
David
-
Got it, yes when I select variants to extract sites/samples the resulting VCF does contain all of my samples but for pipelining purposes I would like to extract the list of samples that are currently in my datastore to be able to know what new samples to add to the datastore and it seems like for this purpose working with the callset.json file is going to be the best path forward as I don't necessarily want to go through the process of extracting sites/samples to a vcf only to figure out what samples are currently in the datastore unless it is absolutely necessary.
Please sign in to leave a comment.
4 comments