Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

How to run GenotypeConcordance

0

19 comments

  • Avatar
    Genevieve Brandt

    Are your files properly indexed? And from the same reference? Make sure your indexes are updated with IndexFeatureFile: https://gatk.broadinstitute.org/hc/en-us/articles/360036899892-IndexFeatureFile

    0
    Comment actions Permalink
  • Avatar
    Ana Marija

    Hi

     

    I did index my files with:

    gatk IndexFeatureFile --input new.vcf.gz
    gatk IndexFeatureFile --input old.vcf.gz

     

    but when I run:

    gatk GenotypeConcordance -TRUTH_VCF new.vcf.gz -TRUTH_SAMPLE 0_fam0110_G110 -CALL_SAMPLE 0_fam0110_G110 -CALL_VCF old.vcf.gz -O concordance.vcf -OUTPUT_VCF true

     

    I am getting:

     

    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /software/linux-el7-x86_64/apps/gatk-4.1.8.1/gatk/build/gatk-4.1.8.1-32-g6018c7b-SNAPSHOT/gatk-package-4.1.8.1-32-g6018c7b-SNAPSHOT-local.jar GenotypeConcordance -TRUTH_VCF new.vcf.gz -TRUTH_SAMPLE 0_fam0110_G110 -CALL_SAMPLE 0_fam0110_G110 -CALL_VCF old.vcf.gz -O concordance.vcf -OUTPUT_VCF true
    12:03:02.832 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/software/linux-el7-x86_64/apps/gatk-4.1.8.1/gatk/build/gatk-4.1.8.1-32-g6018c7b-SNAPSHOT/gatk-package-4.1.8.1-32-g6018c7b-SNAPSHOT-local.jar!/com/intel/gkl/native/libgkl_compression.so
    [Wed Sep 16 12:03:02 CDT 2020] GenotypeConcordance --TRUTH_VCF new.vcf.gz --CALL_VCF old.vcf.gz --OUTPUT concordance.vcf --OUTPUT_VCF true --TRUTH_SAMPLE 0_fam0110_G110 --CALL_SAMPLE 0_fam0110_G110 --INTERSECT_INTERVALS true --MIN_GQ 0 --MIN_DP 0 --OUTPUT_ALL_ROWS false --USE_VCF_INDEX false --MISSING_SITES_HOM_REF false --IGNORE_FILTER_STATUS false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
    Sep 16, 2020 12:03:03 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    [Wed Sep 16 12:03:03 CDT 2020] Executing as anamaria@login-1.saber.acer.uic.edu on Linux 3.10.0-693.21.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_161-b14; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.8.1-32-g6018c7b-SNAPSHOT
    [Wed Sep 16 12:03:03 CDT 2020] picard.vcf.GenotypeConcordance done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=2209873920
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequences at index 0 don't match: 0/249139234/1 0/249218993/1
    at htsjdk.samtools.util.SequenceUtil.assertSequenceListsEqual(SequenceUtil.java:272)
    at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:334)
    at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:320)
    at picard.vcf.GenotypeConcordance.doWork(GenotypeConcordance.java:344)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:301)
    at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

     

    Note I am able to run this command on the same files without errors:

    gatk Concordance -R human_g1k_v37.fasta -eval new.vcf.gz --truth old.vcf.gz --summary summary.tsv

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    Hi Ana Marija, this issue may be because you indexed a zipped version of those files but you are running a command without those zipped versions:

    java -jar picard.jar GenotypeConcordance -CALL_VCF new.vcf -CALL_SAMPLE 0_fam0110_G110 -O gc_concordance.vcf -TRUTH_VCF old.vcf -TRUTH_SAMPLE 0_fam0110_G110

    There is a similar issue on Biostars: https://www.biostars.org/p/344620/

    0
    Comment actions Permalink
  • Avatar
    Ana Marija

    Hi Genevieve,

     

    actually I did index .gz files:

    so this is what I did:

    bgzip new.vcf
    bcftools index new.vcf.gz

    bgzip old.vcf
    bcftools index old.vcf.gz


    #SNP concordance

    module load apps/gatk-4.1.8.1
    gatk IndexFeatureFile --input new.vcf.gz

    gatk IndexFeatureFile --input old.vcf.gz

    java -jar picard.jar GenotypeConcordance CALL_VCF=new.vcf.gz CALL_SAMPLE=0_fam0110_G110 O=gc_concordance.vcf TRUTH_VCF=old.vcf.gz TRUTH_SAMPLE=0_fam0110_G110

     

    So as you can see I indexed .vcf.gz files and I used those to run java -jar picard.jar GenotypeConcordance. Please let me know how to do this properly.

     

    Is it that java -jar picard.jar GenotypeConcordance can only be used with non zipped .vcf files?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    Ana Marija GenotypeConcordance can run with zipped files. Please note that this tool is meant to be two VCFs from the same sample. They must both be from the same reference and the indexes must match. See the description here.

    From the commands you posted above, you are indexing first with bcftools and then again with gatk. There may be issues with overwriting. Can you confirm that there is only one version of each index when you run the GenotypeConcordance command?

    Please check your index files to make sure they match.

    Another note, the best way to submit commands is with the wrapper script. Sometimes weird errors pop up when you submit commands with java -jar. Could you try gatk GenotypeConcordance?

    0
    Comment actions Permalink
  • Avatar
    Ana Marija

    Thank you for getting back to me, so this is what I did now. I removed all indexed files via:

    rm *tbi

    rm *csi

    Then I indexed it again:

    gatk IndexFeatureFile --input new.vcf.gz

    gatk IndexFeatureFile --input old.vcf.gz

     

    Then I run:

    java -jar picard.jar GenotypeConcordance CALL_VCF=new.vcf.gz CALL_SAMPLE=0_fam0110_G110 O=gc_concordance.vcf TRUTH_VCF=old.vcf.gz TRUTH_SAMPLE=0_fam0110_G110

    [Thu Sep 17 17:39:15 CDT 2020] Executing as anamaria@login-1.saber.acer.uic.edu on Linux 3.10.0-693.21.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_161-b14; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.23.4-SNAPSHOT
    [Thu Sep 17 17:39:15 CDT 2020] picard.vcf.GenotypeConcordance done. Elapsed time: 0.00 minutes.
    Runtime.totalMemory()=2020081664
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    Exception in thread "main" htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequences at index 0 don't match: 0/249218993/1 0/249167691/1
    at htsjdk.samtools.util.SequenceUtil.assertSequenceListsEqual(SequenceUtil.java:272)
    at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:334)
    at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:320)
    at picard.vcf.GenotypeConcordance.doWork(GenotypeConcordance.java:344)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:301)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

     

    I also did:

    gatk GenotypeConcordance CALL_VCF=new.vcf.gz CALL_SAMPLE=0_fam0110_G110 O=gc_concordance.vcf TRUTH_VCF=old.vcf.gz TRUTH_SAMPLE=0_fam0110_G110

     

    [Thu Sep 17 17:40:51 CDT 2020] picard.vcf.GenotypeConcordance done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=2245001216
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    htsjdk.samtools.util.SequenceUtil$SequenceListsDifferException: Sequences at index 0 don't match: 0/249218993/1 0/249167691/1
    at htsjdk.samtools.util.SequenceUtil.assertSequenceListsEqual(SequenceUtil.java:272)
    at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:334)
    at htsjdk.samtools.util.SequenceUtil.assertSequenceDictionariesEqual(SequenceUtil.java:320)
    at picard.vcf.GenotypeConcordance.doWork(GenotypeConcordance.java:344)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:301)
    at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

     

    And I am confirming that old.vcf.gz and new.vcf.gz have the same 1565 subjects and along that they both save this subject: 0_fam0110_G110. Both files are from the same build 37.

     

    Can you please tell me how do I further check new.vcf.gz.tbi  and old.vcf.gz.tbi ?

     

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    Ana Marija Please post the first few lines of each index file.

    0
    Comment actions Permalink
  • Avatar
    Ana Marija

    They are binary files so here it is:

    head old.vcf.gz.tbi
    ?BCaT=?w\?????Uf?????Q??#"{%{?&{??$??=BrPf6?-??MY?{?w?~?????:???8n?????ݱq?t,Kz?Œ?k?X,????b?e?X?,?K5Ku????????ũ??????lqr?8U?8U?8?Z?jX?jZ?jY??Z??,?Ζ}.??<?7???Œn~?X???]??X?????????L????HW??
    ??;?<???~?`v]????F??a??MtuU??3????Q?̆??-?E÷??9??1;???W/?1????9?!???????>????]????U?G????c?*\????Ǚ???}??=DW?zE<???gm6{???6$?G?Q?:??ctu???O?Փ?????['?=CWW̱8??~??QQ?Y??a?????j?t?ۡ??Ձ
    ^?IWǏ0

     

    head new.vcf.gz.tbi
    ?BCP=?eT?????1?1??T?[Q?
    D??;???????D???N?w????|?=?߇Y??Yk^??s7?????p?s8N\???ȉ?8?R?????(?(?(?(?(?(?(??.???vx?8??8??:??9??;?K8?K:?K9|
    ;|?>>???????ܵ=?q?c??p
    ?????????????c'????w????걮w<?7+??L?lW??~MmW??w??n???L??JW/>h???>??v']?ݣ]?n?zj

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    Ana Marija did you try running this with unzipped files? Do your index files match there?

    0
    Comment actions Permalink
  • Avatar
    Ana Marija

    are you advising me to do the bellow?

    gunzip old.vcf.gz

    gunzip new.vcf.gz

    gatk IndexFeatureFile --input new.vcf

    gatk IndexFeatureFile --input old.vcf

     

    and then run:

    gatk GenotypeConcordance CALL_VCF=new.vcf CALL_SAMPLE=0_fam0110_G110 O=gc_concordance.vcf TRUTH_VCF=old.vcf TRUTH_SAMPLE=0_fam0110_G110

     

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    Ana Marija

    Can you provide these details: 

    • How were the old and new VCFs generated?
    • Are the old and new VCF files from the same reference?
    0
    Comment actions Permalink
  • Avatar
    Ana Marija

    They are both listed to Build 37 using this same file: hg18ToHg19.over.chain.gz downloaded from here: http://hgdownload.cse.ucsc.edu/goldenpath/hg18/liftOver/

    old is imputed genotypes done on http://csg.sph.umich.edu/abecasis/MACH/tour/imputation.html

    prior imputation "old" was on build 36 then I lifted it to build 37 to prepare it for gatk procedure (like the above)

    new is imputed on https://imputationserver.sph.umich.edu/index.html#! using minimac4 using Reference panel: HRC r1.1 2016 (GRCh37/hg19) and downloaded files from there were already on build 37.

     

    Please let me know if you need more details

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    Ana Marija I am wondering if there are differences in the references

    I am confused regarding your wording, did you liftover both files using the same chain file? You also said your new.vcf was already on build 37 (GRCh37/hg19).

    If there are small differences in the references, it would cause this error message.

    0
    Comment actions Permalink
  • Avatar
    Ana Marija

    sorry for confusion. Let me please explain again:

    -the old genotype files prior imputation were build 36 and than after the imputation done in 2011 (using Mach) was still on build 36.

    I lifted those old files to build 37 using hg18ToHg19.over.chain.gz (before I used it with gtak)

     

    -the new genotype files were also build 36 prior imputation. in order to prepare it for imputation on minimac4 I had to lift it to build 37 using hg18ToHg19.over.chain.gz. After the imputation the new files stayed on build 37.

    When doing imputation of new files I had these parameters selected:

    Reference panel: HRC r1.1 2016 (GRCh37/hg19)
    Build: hg19
    Reference Panel: apps@hrc-r1.1 (hg19)
    Population: eur
    Phasing: eagle
    rsq=0.3

     

    so the new files stayed on the build 37.

     

    Do you think that RS names changed largely from 2011 to 2020? Because when I try to find the common SNP between old and new files there is only ~850 000 in common. While new has 2.7 million SNPs and old has 2.5 million of SNPs.

     

    Thank you for that link!

     

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    Ana Marija could you upload your files following this description so we can take a look?

    0
    Comment actions Permalink
  • Avatar
    Ana Marija

    Hi Genevieve,

     

    I did send my files via ftp server. Can you please check if you received them?

    One should be named new.vcf.tar.gz and another Nold.vcf.tar.gz

     

    I am not sure if Nold.vcf.tar.gz transferred completely, because I was getting

    550 Nold.vcf.tar.gz: Overwrite permission denied

     

    in FileZilla

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    Ana Marija please follow all of the instructions here, including uploading all your files into one folder. Please also name the files something unique so we can easily find them.

    0
    Comment actions Permalink
  • Avatar
    Ana Marija

    Hello,

     

    I did manage to transfer: new.anamaria.vcf.tar.gz

     

    but when I try to transfer to 2nd file: again.old.vcf.tar.gz

     

    I am getting the error in attach. Can you please let me know why does the server doesn't allow me to transfer the 2nd file?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    From my previous post, please upload all your files in one zipped folder. I will only be able to look into this when you have done that. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk