Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Picard liftover error: Output vcf and Reject vcf are empty

0

18 comments

  • Avatar
    Genevieve Brandt

    Hi SSH123, see the UCSC website for more information on chain files here: https://genome.ucsc.edu/goldenPath/help/chain.html. There are chain files that you can download also on their website here: http://hgdownload.soe.ucsc.edu/downloads.html#terms

    You can use the chain file from the broad github that you linked, but I would recommend downloading it instead of copying and pasting because there can be formatting issues that will cause problems with the LiftOver command when you copy/paste.

    0
    Comment actions Permalink
  • Avatar
    SSH123

    Hi Genevieve, I checked ucsc website (http://hgdownload.soe.ucsc.edu/goldenPath/hg19/liftOver/) before I posted my question and couldn't find b37tohg19.chain file. I need to convert Mutect2-WGS-panel-b37 to Mutect2-WGS-panel-hg19, and use it as an unbiased mutect pon, while my normal and tumor bam files were referenced by hg19.fa; otherwise I got contig conflict. I saw Funcotator having a flag --force-b37-to-hg19-reference-contig-conversion or --allow-hg19-gencode-b37-contig-matching, however, it doesn't work for mutect, do you have something similar? Or do you have a Mutect2-WGS-panel-hg19 somewhere that I can access? Thanks.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    SSH123

    Is your PON one that you created or one you downloaded? There is a PON we have available for hg19/b37, more information can be found here: https://gatk.broadinstitute.org/hc/en-us/articles/360035890631-Panel-of-Normals-PON-

    The --force-b37-to-hg19-reference-contig-conversion option is specific for Funcotator because it makes the pre-made data sources available for b37 references.

    0
    Comment actions Permalink
  • Avatar
    SSH123

    I don't have enough 'normals' to create my own, so I downloaded the PON you pointed. However, my alignment files were referenced by hg19.fa; while your PON was by Broad 37.fa. That's why I'd like to lift it over otherwise the contigs are different. So, is there any available hg19-PON file or b37tohg19.chain file I can download? Thanks.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    SSH123 It looks like in that resource I pointed to the file could work for hg19 or b37.

    For the chain file, the one you linked to looks like it could work, but I would recommend downloading it from the link you posted instead of copying and pasting, so that there are no format issues.

    0
    Comment actions Permalink
  • Avatar
    SSH123

    Thanks for your patience, Genevieve. But it looks in that resource you pointed to the file could not work for hg19. See below for the code and error. It is an obvious b37 to hg19 contig error.

    Code executed:

    # Load required modules
    module load gatk/4.1.4.1-python-3.7.4

    # Launch multiple process gatk code
    gatk Mutect2 \
    -R reference/somatic-b37-Homo_sapiens_assembly19.fasta \
    -I normal.bam \
    -I tumor.bam \
    -normal normal \
    --germline-resource reference/somatic-b37-af-only-gnomad.raw.sites.vcf \
    --panel-of-normals reference/somatic-b37-Mutect2-WGS-panel-b37.vcf \
    -O somatic.vcf.gz

    Error: A USER ERROR has occurred: Input files reference and reads have incompatible contigs: No overlapping contigs found.
    reference contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, GL000207.1, GL000226.1, GL000229.1, GL000231.1, GL000210.1, GL000239.1, GL000235.1, GL000201.1, GL000247.1, GL000245.1, GL000197.1, GL000203.1, GL000246.1, GL000249.1, GL000196.1, GL000248.1, GL000244.1, GL000238.1, GL000202.1, GL000234.1, GL000232.1, GL000206.1, GL000240.1, GL000236.1, GL000241.1, GL000243.1, GL000242.1, GL000230.1, GL000237.1, GL000233.1, GL000204.1, GL000198.1, GL000208.1, GL000191.1, GL000227.1, GL000228.1, GL000214.1, GL000221.1, GL000209.1, GL000218.1, GL000220.1, GL000213.1, GL000211.1, GL000199.1, GL000217.1, GL000216.1, GL000215.1, GL000205.1, GL000219.1, GL000224.1, GL000223.1, GL000195.1, GL000212.1, GL000222.1, GL000200.1, GL000193.1, GL000194.1, GL000225.1, GL000192.1, NC_007605]
    reads contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chr1_gl000191_random, chr1_gl000192_random, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7_gl000195_random, chr8_gl000196_random, chr8_gl000197_random, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chr11_gl000202_random, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18_gl000207_random, chr19_gl000208_random, chr19_gl000209_random, chr21_gl000210_random, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249]

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    Hi SSH123, this looks like an issue not with the panel of normals, but because the read and reference contigs are named differently even though they are from the same reference version. You will need these to have the same naming convention for GATK to work.

    0
    Comment actions Permalink
  • Avatar
    SSH123

    Hi, Genevieve, I changed reference to ucsc.hg19.fasta. Similar error except this time reference contigs = [chrM, chr1...] and features contigs = [1, 2...]. So, it looks the ucsc.hg19.fasta, my bam files are hg19 contigs; however, the Mutect2-WGS-panel-b37.vcf is b37 contigs. I need to either realign bam files using b37.fa or lift over Mutect2-WGS-panel-b37.vcf to hg19. I chose to do the latter. Can you please tell me how to download this b37tohg19.chain file? I've tried wget https and resulted in a file that contains html codes and chain codes together. Also, git clone doesn't work. I can't find Download button or Clone button neither. Thanks very much!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    SSH123 there are two problems going on here that are separate. Please let me know if these don't make sense so I can clarify.

    1. Your error message with reference contigs = [chrM, chr1...] and features contigs = [1, 2...] is occurring because your feature contigs do not have the same naming convention as the contigs in the reference you are using. Please post your complete GATK command with the complete error message so we can identify what is going on in the error.
    2. I have confirmed with my team that the b37tohg19 chain file you found is very old and should not be used. We are looking into if we can make one and I will let you know if I have any updates.
    0
    Comment actions Permalink
  • Avatar
    SSH123

    Hi, Genevieve,

    It'd be nice if you could update the chain file. This particular chain is not listed in Golden Gate repository. Directly downloading from the https didn't work for me. Copying and pasting worked (see RE 2.). Then, I came across the second problem you might have already know: feature contigs didn't match. I attached the complete error message below (see RE 1.) I guess there're two culprits: 1) I missed copying dict file for gnomad according to "WARN IndexUtils - Feature file "/reference/somatic-b37-af-only-gnomad.raw.sites.vcf" appears to contain no sequence dictionary. Attempting to retrieve a sequence dictionary from the associated index file17:45:40.215 INFO Mutect2 - Shutting down engine"; 2) gnomad is also a feature and it is formatted in b37. So, I'll need this chain file to lift both pon and gnomad; then create a sequence dictionary for gnomad. Does vcf need a dictionary? Looks picard or samtools only do so for fasta. Thanks very much for your time.

    RE reversely:

    2. If download chain file, use Code 1: java -jar picard.jar LiftoverVcf \
    I=reference/somatic-b37-Mutect2-WGS-panel-b37.vcf \
    O=reference/Mutect2-WGS-panel-hg19-1.vcf \
    CHAIN=reference/b37tohg19.chain \
    REJECT=reference/rejected_variants-1.vcf \
    R=reference/ucsc.hg19.fasta

    Error message: Exception in thread "main" htsjdk.samtools.SAMException: chain line has wrong number of fields in chain file reference/b37tohg19.chain at line 1

    If copy and paste chain file and use similar code, it actually worked out a vcf and an index.

    Executing message: Executing as on Linux 3.10.0-1127.10.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 11.0.2+9; Picard version: 2.6.0-SNAPSHOT
    INFO 2020-11-09 17:22:00 LiftoverVcf Loading up the target reference genome.
    INFO 2020-11-09 17:22:10 LiftoverVcf Lifting variants over and sorting.
    INFO 2020-11-09 17:22:17 LiftoverVcf read 1,000,000 records. Elapsed time: 00:00:06s. Time for last 1,000,000: 6s. Last read position: 2:30,458,643

    1. If use the lifted pon file and call somatic, use code: gatk Mutect2 \
    -R reference/ucsc.hg19.fasta \
    -I bam/normal.bam \
    -I bam/tumor.bam \
    -normal \
    --germline-resource reference/somatic-b37-af-only-gnomad.raw.sites.vcf \
    --panel-of-normals reference/Mutect2-WGS-panel-hg19.vcf \
    -O mutect/tumor.vcf.gz

    Then, error message:

    17:45:36.938 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/easybuild-2019/easybuild/software/mpi/gcc/8.3.0/openmpi/3.1.4/gatk/4.1.4.1-python-3.7.4/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Nov 09, 2020 5:45:37 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    17:45:37.420 INFO Mutect2 - ------------------------------------------------------------
    17:45:37.420 INFO Mutect2 - The Genome Analysis Toolkit (GATK) v4.1.4.1
    17:45:37.420 INFO Mutect2 - For support and documentation go to https://software.broadinstitute.org/gatk/
    17:45:37.420 INFO Mutect2 - Executing as on Linux v3.10.0-1127.10.1.el7.x86_64 amd64
    17:45:37.420 INFO Mutect2 - Java runtime: OpenJDK 64-Bit Server VM v11.0.2+9
    17:45:37.420 INFO Mutect2 - Start Date/Time: 9 November 2020 at 5:45:36 pm AEDT
    17:45:37.420 INFO Mutect2 - ------------------------------------------------------------
    17:45:37.420 INFO Mutect2 - ------------------------------------------------------------
    17:45:37.421 INFO Mutect2 - HTSJDK Version: 2.21.0
    17:45:37.421 INFO Mutect2 - Picard Version: 2.21.2
    17:45:37.421 INFO Mutect2 - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    17:45:37.421 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    17:45:37.421 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    17:45:37.421 INFO Mutect2 - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    17:45:37.421 INFO Mutect2 - Deflater: IntelDeflater
    17:45:37.421 INFO Mutect2 - Inflater: IntelInflater
    17:45:37.421 INFO Mutect2 - GCS max retries/reopens: 20
    17:45:37.421 INFO Mutect2 - Requester pays: disabled
    17:45:37.421 INFO Mutect2 - Initializing engine
    17:45:37.644 INFO FeatureManager - Using codec VCFCodec to read file file://reference/somatic-b37-Mutect2-WGS-panel-hg19.vcf
    17:45:37.780 INFO FeatureManager - Using codec VCFCodec to read file file://reference/somatic-b37-af-only-gnomad.raw.sites.vcf
    17:45:39.080 WARN IndexUtils - Feature file "/reference/somatic-b37-af-only-gnomad.raw.sites.vcf" appears to contain no sequence dictionary. Attempting to retrieve a sequence dictionary from the associated index file
    17:45:40.215 INFO Mutect2 - Shutting down engine
    [9 November 2020 at 5:45:40 pm AEDT] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.06 minutes.
    Runtime.totalMemory()=1291845632
    ***********************************************************************

    A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found.
    reference contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chr1_gl000191_random, chr1_gl000192_random, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7_gl000195_random, chr8_gl000196_random, chr8_gl000197_random, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chr11_gl000202_random, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18_gl000207_random, chr19_gl000208_random, chr19_gl000209_random, chr21_gl000210_random, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249]
    features contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X]

    ***********************************************************************
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
    Using GATK jar /usr/local/easybuild-2019/easybuild/software/mpi/gcc/8.3.0/openmpi/3.1.4/gatk/4.1.4.1-python-3.7.4/gatk-package-4.1.4.1-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /usr/local/easybuild-2019/easybuild/software/mpi/gcc/8.3.0/openmpi/3.1.4/gatk/4.1.4.1-python-3.7.4/gatk-package-4.1.4.1-local.jar Mutect2 -R reference/ucsc.hg19.fasta -I bam/Normal.bam -I bam/Tumor.bam -normal Normal --germline-resource reference/somatic-b37-af-only-gnomad.raw.sites.vcf --panel-of-normals reference/Mutect2-WGS-panel-hg19.vcf -O somatic.vcf.gz

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    Hi SSH123,

    Could you also lift over the file somatic-b37-af-only-gnomad.raw.sites.vcf? If it is still b37, it could be causing this issue.

    0
    Comment actions Permalink
  • Avatar
    SSH123

    Hi, Genevieve,

    It failed after 42 minters. The output vcf file is empty. No index file was made. Please see code and slurm.out below. Thanks.

    Code: java -jar picard.jar LiftoverVcf \
    I=reference/somatic-b37-af-only-gnomad.raw.sites.vcf \
    O=reference/somatic-hg19-af-only-gnomad.raw.sites.vcf \
    CHAIN=reference/37to19.chain \
    REJECT=reference/rejected_variants-gnomad.vcf \
    R=reference/ucsc.hg19.fasta

    Slurm.out: To execute picard run: java -jar picard.jar
    [Tue Nov 10 11:59:45 AEDT 2020] picard.vcf.LiftoverVcf INPUT=reference/somatic-b37-af-only-gnomad.raw.sites.vcf OUTPUT=reference/somatic-hg19-af-only-gnomad.raw.sites.vcf CHAIN=reference/37to19.chain REJECT=reference/rejected_variants-gnomad.vcf REFERENCE_SEQUENCE=reference/ucsc.hg19.fasta WARN_ON_MISSING_CONTIG=false WRITE_ORIGINAL_POSITION=false LIFTOVER_MIN_MATCH=1.0 ALLOW_MISSING_FIELDS_IN_HEADER=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
    [Tue Nov 10 11:59:45 AEDT 2020] Executing as on Linux 3.10.0-1127.10.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 11.0.2+9; Picard version: 2.6.0-SNAPSHOT
    INFO 2020-11-10 11:59:45 LiftoverVcf Loading up the target reference genome.
    INFO 2020-11-10 11:59:55 LiftoverVcf Lifting variants over and sorting.
    INFO 2020-11-10 12:00:02 LiftoverVcf read 1,000,000 records. Elapsed time: 00:00:06s. Time for last 1,000,000: 6s. Last read position: 1:9,514,996
    INFO 2020-11-10 12:00:08 LiftoverVcf read 2,000,000 records. Elapsed time: 00:00:13s. Time for last 1,000,000: 6s. Last read position: 1:19,801,042
    ...(#Omitted many lines here)
    INFO 2020-11-10 12:39:52 LiftoverVcf read 212,000,000 records. Elapsed time: 00:39:57s. Time for last 1,000,000: 55s. Last read position: 15:62,176,239
    INFO 2020-11-10 12:41:08 LiftoverVcf read 213,000,000 records. Elapsed time: 00:41:13s. Time for last 1,000,000: 76s. Last read position: 15:73,021,527
    [Tue Nov 10 12:42:39 AEDT 2020] picard.vcf.LiftoverVcf done. Elapsed time: 42.90 minutes.
    Runtime.totalMemory()=20971520
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.base/java.util.Collections.unmodifiableSet(Collections.java:1120)
    at htsjdk.variant.variantcontext.CommonInfo.getFilters(CommonInfo.java:98)
    at htsjdk.variant.variantcontext.VariantContext.getFilters(VariantContext.java:726)
    at picard.vcf.LiftoverVcf.doWork(LiftoverVcf.java:259)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:208)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    Hi SSH123, this is a memory error [Exception in thread "main" java.lang.OutOfMemoryError: Java heap space] and has been discussed on the forum in other threads. Please see those for troubleshooting information.

    0
    Comment actions Permalink
  • Avatar
    SSH123

    Hi, Genevieve, you were right: picard successfully lifted gnomad after I increased java max heap. And mutect also worked. Do you think somebody is going to update the chain file soon? Because there are 65K mismatches in WGS PON rejected.variants.vcf (see examples below). Surprisedly, no mismatch is found in gnomad rejected variants. The mismatches are found in MT, GL000207 to GL000192. Thanks.

    #CHROM POS ID REF ALT QUAL FILTER INFO
    MT 73 . A G . MismatchedRefAllele AC=34;AF=0.500;AN=68;DP=0;set=variant2
    MT 150 . C T . MismatchedRefAllele AC=5;AF=0.50;AN=10;DP=0;set=variant2

    ...

    GL000207.1 137 . A C . NoTarget AC=39;AF=0.500;AN=78;DP=0;set=variant2
    GL000207.1 149 . A G . NoTarget AC=8;AF=0.500;AN=16;DP=0;set=variant2

    ...

    GL000192.1 547182 . G C . NoTarget AC=2;AF=0.50;AN=4;DP=0;set=variant2
    GL000192.1 547218 . C T . NoTarget AC=11;AF=0.500;AN=22;DP=0;set=variant2

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    SSH123 This is something we are working on but we are not able to provide you a timeline for when it will get done because our team has projects we are working on currently. I will keep you up to date when we have it available. 

    0
    Comment actions Permalink
  • Avatar
    yuan

    Hi, 

    This is exactly error I have. I need the PON and germline resource for hg19. The resource files for b37 cannot be used since the contigs are different. Is there any place I can directly download those files? Thank you so much

    0
    Comment actions Permalink
  • Avatar
    ISmolicz

    Genevieve Brandt - I would also like to add that a chain file to liftover from b37 to hg19, compatible with the hg19 version in the GATK Resource bundle, would be extremely useful. However, I understand this may take time to become available.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt

    Hi yuan, we do not have the resources prepared directly for the hg19 reference, what these users are trying to do is LiftOver the files from b37 to hg19 using a chain file. The issue is, however, there is no up to date chain file.

    ISmolicz we will update this post when it becomes available, but it may take some time. If anyone builds one on their own, please let us know so we can share it with other users.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk