How to run GenotypeConcordance
If you are seeing an error, please provide(REQUIRED) :
a) GATK version used: picard-2.23.4
b) Exact command used:
c) Entire error log:
10:36:10.170 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/projects/com_grassim/anamaria/anamaria/anamaria/gatk-4.1.8.1/picard/build/libs/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Fri Sep 11 10:36:10 CDT 2020] GenotypeConcordance --TRUTH_VCF old.vcf --CALL_VCF new.vcf --OUTPUT gc_concordance.vcf --TRUTH_SAMPLE 0_fam0110_G110 --CALL_SAMPLE 0_fam0110_G110 --OUTPUT_VCF false --INTERSECT_INTERVALS true --MIN_GQ 0 --MIN_DP 0 --OUTPUT_ALL_ROWS false --USE_VCF_INDEX false --MISSING_SITES_HOM_REF false --IGNORE_FILTER_STATUS false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Fri Sep 11 10:36:10 CDT 2020] Executing as anamaria@login-1.saber.acer.uic.edu on Linux 3.10.0-693.21.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_161-b14; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.23.4-SNAPSHOT
[Fri Sep 11 10:36:10 CDT 2020] picard.vcf.GenotypeConcordance done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2020081664
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequences at index 0 don't match: 0/249218993/1 0/249139234/1
at htsjdk.samtools.util.SequenceUtil.assertSequenceListsEqual(SequenceUtil.java:272)
at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:334)
at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:320)
at picard.vcf.GenotypeConcordance.doWork(GenotypeConcordance.java:344)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:301)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
If not an error, choose a category for your question(REQUIRED):
a)How do I run this command?
java -jar picard.jar GenotypeConcordance -CALL_VCF new.vcf -CALL_SAMPLE 0_fam0110_G110 -O gc_concordance.vcf -TRUTH_VCF old.vcf -TRUTH_SAMPLE 0_fam0110_G110
I also tried running it with:
gatk GenotypeConcordance -TRUTH_VCF new.vcf -TRUTH_SAMPLE 0_fam0110_G110 -CALL_SAMPLE 0_fam6106_G6106 -CALL_VCF old.vcf -O concordance.vcf -OUTPUT_VCF true
And I got this:
gatk GenotypeConcordance -TRUTH_VCF new.vcf -TRUTH_SAMPLE 0_fam0110_G110 -CALL_SAMPLE 0_fam6106_G6106 -CALL_VCF old.vcf -O concordance.vcf -OUTPUT_VCF true
Using GATK jar /software/linux-el7-x86_64/apps/gatk-4.1.8.1/gatk/build/gatk-4.1.8.1-32-g6018c7b-SNAPSHOT/gatk-package-4.1.8.1-32-g6018c7b-SNAPSHOT-local.jar defined in environment variable GATK_LOCAL_JAR
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /software/linux-el7-x86_64/apps/gatk-4.1.8.1/gatk/build/gatk-4.1.8.1-32-g6018c7b-SNAPSHOT/gatk-package-4.1.8.1-32-g6018c7b-SNAPSHOT-local.jar GenotypeConcordance -TRUTH_VCF new.vcf -TRUTH_SAMPLE 0_fam0110_G110 -CALL_SAMPLE 0_fam6106_G6106 -CALL_VCF old.vcf -O concordance.vcf -OUTPUT_VCF true
19:53:52.666 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/software/linux-el7-x86_64/apps/gatk-4.1.8.1/gatk/build/gatk-4.1.8.1-32-g6018c7b-SNAPSHOT/gatk-package-4.1.8.1-32-g6018c7b-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Fri Sep 11 19:53:52 CDT 2020] GenotypeConcordance --TRUTH_VCF new.vcf --CALL_VCF old.vcf --OUTPUT concordance.vcf --OUTPUT_VCF true --TRUTH_SAMPLE 0_fam0110_G110 --CALL_SAMPLE 0_fam6106_G6106 --INTERSECT_INTERVALS true --MIN_GQ 0 --MIN_DP 0 --OUTPUT_ALL_ROWS false --USE_VCF_INDEX false --MISSING_SITES_HOM_REF false --IGNORE_FILTER_STATUS false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
Sep 11, 2020 7:53:52 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
[Fri Sep 11 19:53:52 CDT 2020] Executing as anamaria@login-1.saber.acer.uic.edu on Linux 3.10.0-693.21.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_161-b14; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.8.1-32-g6018c7b-SNAPSHOT
[Fri Sep 11 19:53:53 CDT 2020] picard.vcf.GenotypeConcordance done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2225602560
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequences at index 0 don't match: 0/249139234/1 0/249218993/1
at htsjdk.samtools.util.SequenceUtil.assertSequenceListsEqual(SequenceUtil.java:272)
at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:334)
at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:320)
at picard.vcf.GenotypeConcordance.doWork(GenotypeConcordance.java:344)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:301)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Sample 0_fam0110_G110 is in both new.vcf and old.vcf and they contain all samples
b) What does (......) mean?
c) Why do I see (......)?
d) Where do I find (......)?
e) Will (......) be in future releases?
-
Are your files properly indexed? And from the same reference? Make sure your indexes are updated with IndexFeatureFile: https://gatk.broadinstitute.org/hc/en-us/articles/360036899892-IndexFeatureFile
-
Hi
I did index my files with:
gatk IndexFeatureFile --input new.vcf.gz
gatk IndexFeatureFile --input old.vcf.gzbut when I run:
gatk GenotypeConcordance -TRUTH_VCF new.vcf.gz -TRUTH_SAMPLE 0_fam0110_G110 -CALL_SAMPLE 0_fam0110_G110 -CALL_VCF old.vcf.gz -O concordance.vcf -OUTPUT_VCF true
I am getting:
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /software/linux-el7-x86_64/apps/gatk-4.1.8.1/gatk/build/gatk-4.1.8.1-32-g6018c7b-SNAPSHOT/gatk-package-4.1.8.1-32-g6018c7b-SNAPSHOT-local.jar GenotypeConcordance -TRUTH_VCF new.vcf.gz -TRUTH_SAMPLE 0_fam0110_G110 -CALL_SAMPLE 0_fam0110_G110 -CALL_VCF old.vcf.gz -O concordance.vcf -OUTPUT_VCF true
12:03:02.832 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/software/linux-el7-x86_64/apps/gatk-4.1.8.1/gatk/build/gatk-4.1.8.1-32-g6018c7b-SNAPSHOT/gatk-package-4.1.8.1-32-g6018c7b-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Sep 16 12:03:02 CDT 2020] GenotypeConcordance --TRUTH_VCF new.vcf.gz --CALL_VCF old.vcf.gz --OUTPUT concordance.vcf --OUTPUT_VCF true --TRUTH_SAMPLE 0_fam0110_G110 --CALL_SAMPLE 0_fam0110_G110 --INTERSECT_INTERVALS true --MIN_GQ 0 --MIN_DP 0 --OUTPUT_ALL_ROWS false --USE_VCF_INDEX false --MISSING_SITES_HOM_REF false --IGNORE_FILTER_STATUS false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
Sep 16, 2020 12:03:03 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
[Wed Sep 16 12:03:03 CDT 2020] Executing as anamaria@login-1.saber.acer.uic.edu on Linux 3.10.0-693.21.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_161-b14; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.8.1-32-g6018c7b-SNAPSHOT
[Wed Sep 16 12:03:03 CDT 2020] picard.vcf.GenotypeConcordance done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2209873920
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequences at index 0 don't match: 0/249139234/1 0/249218993/1
at htsjdk.samtools.util.SequenceUtil.assertSequenceListsEqual(SequenceUtil.java:272)
at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:334)
at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:320)
at picard.vcf.GenotypeConcordance.doWork(GenotypeConcordance.java:344)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:301)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)Note I am able to run this command on the same files without errors:
gatk Concordance -R human_g1k_v37.fasta -eval new.vcf.gz --truth old.vcf.gz --summary summary.tsv
-
Hi Ana Marija, this issue may be because you indexed a zipped version of those files but you are running a command without those zipped versions:
java -jar picard.jar GenotypeConcordance -CALL_VCF new.vcf -CALL_SAMPLE 0_fam0110_G110 -O gc_concordance.vcf -TRUTH_VCF old.vcf -TRUTH_SAMPLE 0_fam0110_G110
There is a similar issue on Biostars: https://www.biostars.org/p/344620/
-
Hi Genevieve,
actually I did index .gz files:
so this is what I did:
bgzip new.vcf
bcftools index new.vcf.gzbgzip old.vcf
bcftools index old.vcf.gz
#SNP concordancemodule load apps/gatk-4.1.8.1
gatk IndexFeatureFile --input new.vcf.gzgatk IndexFeatureFile --input old.vcf.gz
java -jar picard.jar GenotypeConcordance CALL_VCF=new.vcf.gz CALL_SAMPLE=0_fam0110_G110 O=gc_concordance.vcf TRUTH_VCF=old.vcf.gz TRUTH_SAMPLE=0_fam0110_G110
So as you can see I indexed .vcf.gz files and I used those to run java -jar picard.jar GenotypeConcordance. Please let me know how to do this properly.
Is it that java -jar picard.jar GenotypeConcordance can only be used with non zipped .vcf files?
-
Ana Marija GenotypeConcordance can run with zipped files. Please note that this tool is meant to be two VCFs from the same sample. They must both be from the same reference and the indexes must match. See the description here.
From the commands you posted above, you are indexing first with bcftools and then again with gatk. There may be issues with overwriting. Can you confirm that there is only one version of each index when you run the GenotypeConcordance command?
Please check your index files to make sure they match.
Another note, the best way to submit commands is with the wrapper script. Sometimes weird errors pop up when you submit commands with java -jar. Could you try gatk GenotypeConcordance?
-
Thank you for getting back to me, so this is what I did now. I removed all indexed files via:
rm *tbi
rm *csi
Then I indexed it again:
gatk IndexFeatureFile --input new.vcf.gz
gatk IndexFeatureFile --input old.vcf.gz
Then I run:
java -jar picard.jar GenotypeConcordance CALL_VCF=new.vcf.gz CALL_SAMPLE=0_fam0110_G110 O=gc_concordance.vcf TRUTH_VCF=old.vcf.gz TRUTH_SAMPLE=0_fam0110_G110
[Thu Sep 17 17:39:15 CDT 2020] Executing as anamaria@login-1.saber.acer.uic.edu on Linux 3.10.0-693.21.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_161-b14; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.23.4-SNAPSHOT
[Thu Sep 17 17:39:15 CDT 2020] picard.vcf.GenotypeConcordance done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2020081664
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequences at index 0 don't match: 0/249218993/1 0/249167691/1
at htsjdk.samtools.util.SequenceUtil.assertSequenceListsEqual(SequenceUtil.java:272)
at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:334)
at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:320)
at picard.vcf.GenotypeConcordance.doWork(GenotypeConcordance.java:344)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:301)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)I also did:
gatk GenotypeConcordance CALL_VCF=new.vcf.gz CALL_SAMPLE=0_fam0110_G110 O=gc_concordance.vcf TRUTH_VCF=old.vcf.gz TRUTH_SAMPLE=0_fam0110_G110
[Thu Sep 17 17:40:51 CDT 2020] picard.vcf.GenotypeConcordance done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2245001216
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequences at index 0 don't match: 0/249218993/1 0/249167691/1
at htsjdk.samtools.util.SequenceUtil.assertSequenceListsEqual(SequenceUtil.java:272)
at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:334)
at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:320)
at picard.vcf.GenotypeConcordance.doWork(GenotypeConcordance.java:344)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:301)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)And I am confirming that old.vcf.gz and new.vcf.gz have the same 1565 subjects and along that they both save this subject: 0_fam0110_G110. Both files are from the same build 37.
Can you please tell me how do I further check new.vcf.gz.tbi and old.vcf.gz.tbi ?
-
Ana Marija Please post the first few lines of each index file.
-
They are binary files so here it is:
head old.vcf.gz.tbi
?BCaT=?w\?????Uf?????Q??#"{%{?&{??$??=BrPf6?-??MY?{?w?~?????:???8n?????ݱq?t,Kz?Œ?k?X,????b?e?X?,?K5Ku????????ũ??????lqr?8U?8U?8?Z?jX?jZ?jY??Z??,?Ζ}.??<?7???Œn~?X???]??X?????????L????HW??
??;?<???~?`v]????F??a??MtuU??3????Q?̆??-?E÷??9??1;???W/?1????9?!???????>????]????U?G????c?*\????Ǚ???}??=DW?zE<???gm6{???6$?G?Q?:??ctu???O?Փ?????['?=CWW̱8??~??QQ?Y??a?????j?t?ۡ??Ձ
^?IWǏ0head new.vcf.gz.tbi
?BCP=?eT?????1?1??T?[Q?
D??;???????D???N?w????|?=?߇Y??Yk^??s7?????p?s8N\???ȉ?8?R?????(?(?(?(?(?(?(??.???vx?8??8??:??9??;?K8?K:?K9|
;|?>>???????ܵ=?q?c??p
?????????????c'????w????걮w<?7+??L?lW??~MmW??w??n???L??JW/>h???>??v']?ݣ]?n?zj -
Ana Marija did you try running this with unzipped files? Do your index files match there?
-
are you advising me to do the bellow?
gunzip old.vcf.gz
gunzip new.vcf.gz
gatk IndexFeatureFile --input new.vcf
gatk IndexFeatureFile --input old.vcf
and then run:
gatk GenotypeConcordance CALL_VCF=new.vcf CALL_SAMPLE=0_fam0110_G110 O=gc_concordance.vcf TRUTH_VCF=old.vcf TRUTH_SAMPLE=0_fam0110_G110
-
Can you provide these details:
- How were the old and new VCFs generated?
- Are the old and new VCF files from the same reference?
-
They are both listed to Build 37 using this same file: hg18ToHg19.over.chain.gz downloaded from here: http://hgdownload.cse.ucsc.edu/goldenpath/hg18/liftOver/
old is imputed genotypes done on http://csg.sph.umich.edu/abecasis/MACH/tour/imputation.html
prior imputation "old" was on build 36 then I lifted it to build 37 to prepare it for gatk procedure (like the above)
new is imputed on https://imputationserver.sph.umich.edu/index.html#! using minimac4 using Reference panel: HRC r1.1 2016 (GRCh37/hg19) and downloaded files from there were already on build 37.
Please let me know if you need more details
-
Ana Marija I am wondering if there are differences in the references.
I am confused regarding your wording, did you liftover both files using the same chain file? You also said your new.vcf was already on build 37 (GRCh37/hg19).
If there are small differences in the references, it would cause this error message.
-
sorry for confusion. Let me please explain again:
-the old genotype files prior imputation were build 36 and than after the imputation done in 2011 (using Mach) was still on build 36.
I lifted those old files to build 37 using hg18ToHg19.over.chain.gz (before I used it with gtak)
-the new genotype files were also build 36 prior imputation. in order to prepare it for imputation on minimac4 I had to lift it to build 37 using hg18ToHg19.over.chain.gz. After the imputation the new files stayed on build 37.
When doing imputation of new files I had these parameters selected:
Reference panel: HRC r1.1 2016 (GRCh37/hg19)
Build: hg19
Reference Panel: apps@hrc-r1.1 (hg19)
Population: eur
Phasing: eagle
rsq=0.3so the new files stayed on the build 37.
Do you think that RS names changed largely from 2011 to 2020? Because when I try to find the common SNP between old and new files there is only ~850 000 in common. While new has 2.7 million SNPs and old has 2.5 million of SNPs.
Thank you for that link!
-
Ana Marija could you upload your files following this description so we can take a look?
-
Hi Genevieve,
I did send my files via ftp server. Can you please check if you received them?
One should be named new.vcf.tar.gz and another Nold.vcf.tar.gz
I am not sure if Nold.vcf.tar.gz transferred completely, because I was getting
550 Nold.vcf.tar.gz: Overwrite permission denied
in FileZilla
-
Ana Marija please follow all of the instructions here, including uploading all your files into one folder. Please also name the files something unique so we can easily find them.
-
Hello,
I did manage to transfer: new.anamaria.vcf.tar.gz
but when I try to transfer to 2nd file: again.old.vcf.tar.gz
I am getting the error in attach. Can you please let me know why does the server doesn't allow me to transfer the 2nd file?
-
From my previous post, please upload all your files in one zipped folder. I will only be able to look into this when you have done that.
Please sign in to leave a comment.
19 comments