No results of CollectvariantCallingMetrics
AnsweredHi GATK team,
We ran the CollectvariantCallingMetrics (GATK-4.1.1.0) using VCF file generated from Mutect2 (GATK-4.1.1.0) or HaplotypeCaller (GATK-4.0.9.0). We did not get any output file, while there was a warning “Failed to detect whether we are running on Google Compute Engine” in the log.
Please check the command as below:
date;java -jar /share/app/GATK/gatk-4.1.1.0/gatk-package-4.1.1.0-local.jar CollectVariantCallingMetrics
--DBSNP dbsnp_138.b37.vcf
--INPUT NA11316 _Mutect2.vcf
--OUTPUT NA11316_CollectvariantCallingMetrics.txt
--REFERENCE_SEQUENCE hg19.fasta > NA11316_CollectvariantCallingMetrics.log 2>&1;date
I pasted the full log here, if it could provide some information behind:
16:49:50.657 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/binfo/data3/share/app/GATK/gatk-4.1.1.0/gatk-package-4.1.1.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Nov 26 16:49:50 CST 2020] CollectVariantCallingMetrics --INPUT NA11316_M_F0x100_Mutect2_tumor_maxaf1_mrra0.vcf --OUTPUT NA11316_M_F0x100_Mutect2_tumor_maxaf1_mrra0_COllectvariantCallingMetrics.txt --DBSNP /binfo/data3/home/chiao/reference/dbSNP_151/dbsnp_138.b37.excluding_sites_after_129.vcf --REFERENCE_SEQUENCE /binfo/data3/home/chiao/reference/hg19.fasta --GVCF_INPUT false --THREAD_COUNT 1 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
Nov 26, 2020 4:51:47 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
[Thu Nov 26 16:51:47 CST 2020] Executing as chiao@binfo-mgmt on Linux 2.6.32-431.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.1.0
INFO 2020-11-26 16:51:47 CollectVariantCallingMetrics Loading dbSNP file ...
[Thu Nov 26 16:51:47 CST 2020] picard.vcf.CollectVariantCallingMetrics done. Elapsed time: 1.94 minutes.
Runtime.totalMemory()=865075200
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
java.lang.NullPointerException
at picard.util.DbSnpBitSetUtil.loadVcf(DbSnpBitSetUtil.java:163)
at picard.util.DbSnpBitSetUtil.createSnpAndIndelBitSets(DbSnpBitSetUtil.java:131)
at picard.vcf.CollectVariantCallingMetrics.doWork(CollectVariantCallingMetrics.java:105)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
Please advise and thank you in advance!
-
Hi Yenan, could you try this with the newest GATK version (4.1.9.0) to see if this issue persists?
This message: "INFO: Failed to detect whether we are running on Google Compute Engine." is not an error message, you can ignore it.
However, this message indicates an issue: "java.lang.NullPointerException
at picard.util.DbSnpBitSetUtil.loadVcf(DbSnpBitSetUtil.java:163)"
-
Dear GATK Team,
I am having a similar issue to Yenan, when using CollectVariantCallingMetrics, was a cause/workaround ever found for this?
I have no problem when using this tool with the GRCh38 dbSNP (build 138) vcf file provided in the resource bundle, however whenever I try a different dbSNP build, it throws this error:
[Fri Jul 23 13:25:03 CEST 2021] picard.vcf.CollectVariantCallingMetrics done. Elapsed time: 70.55 minutes.
Runtime.totalMemory()=1623195648
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
java.lang.NullPointerException: Cannot invoke "htsjdk.samtools.SAMSequenceRecord.getSequenceLength()" because the return value of "htsjdk.samtools.SAMSequenceDictionary.getSequence(String)" is null
at picard.util.DbSnpBitSetUtil.loadVcf(DbSnpBitSetUtil.java:163)
at picard.util.DbSnpBitSetUtil.createSnpAndIndelBitSets(DbSnpBitSetUtil.java:131)
at picard.vcf.CollectVariantCallingMetrics.doWork(CollectVariantCallingMetrics.java:101)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
As a bit of a background, I am trying to use the latest dbSNP release (build 155, GRCh38, GCF_000001405.39) and have tried using GATK version 4.1.9.0 and the latest 4.2.0.0, both having the same problem.
To prepare the dbSNP file for use with the best practices workflow, I renamed the NCBI chromosome accession numbers to UCSC style names using bcftools annotate, updated the vcf headers using UpdateVcfSequenceDictionary, and indexed the file using IndexFeatureFile.
The dbSNP file worked well with both HaplotypeCaller and GenotypeGVCFs, with the rsids overlapping perfectly with those obtained when using the dbSNP resource bundle version.
Any help with this would be greatly appreciated!
-
Hi FranBC,
The NullPointerException error likely means that there is a contig present in your vcf file that is not present in your reference dictionary. Could please check to be sure that your vcf headers and reference dictionary have matching chromosomes or follow these instructions to provide those files so the GATK team can look into it?
Kind regards,
Pamela
-
Hi Pamela,
The VCF contig headers do seem to match, so I've uploaded the files to the GATK FTP server as: CollectVariantCallingMetrics_dbsnp155_FranBC2.tar.gz
Thanks and best regards,
Francesca
-
-
Hi FranBC,
I was able to take a look at your bug report and found that there are contigs in your dbsnp file you are using that are not in the standard hg38 reference.
The issue starts at this point in your dbsnp file:
##contig=<ID=HLA-A*01:01:01:01,length=3503,assembly=38>
The contig HLA-A*01:01:01:01 is not present in the hg38 reference dictionary. You will need to update your files to match each other to use CollectVariantCallingMetrics.
Best,
Genevieve
-
Hi Genevieve,
Thank you for the prompt reply. Apologies, I thought I had supplied this information - I am actually using the build hg38 + decoys fasta reference file (Homo_sapiens_assembly38.fasta) from the resource bundle which does contain that contig (and all the HLA loci following it) and was used to update the VCF headers in the dbSNP vcf file using Picard's UpdateVcfSequenceDictionary tool (the one supplied by dbSNP does not contain any contigs in the header).
So I'm afraid that I am still a bit confused because those HLA contigs are also found in the dbSNP VCF file supplied in the resource bundle (Homo_sapiens_assembly38.dbsnp138.vcf) which doesn't generate any errors with CollectVariantCallingMetrics.
Thanks and best regards,
Francesca
-
FranBC I see, thanks for letting me know, could you clarify which file you used? I used the hg38_v0_Homo_sapiens_assembly38.fasta file, I don't have the details about the specific path on google though.
-
Genevieve Brandt (she/her) The file I used is: Homo_sapiens_assembly38.fasta
-
FranBC I downloaded the Homo_sapiens_assembly38.dict from that specific link and did not find the contig I mentioned above:
##contig=<ID=HLA-A*01:01:01:01,length=3503,assembly=38>
-
Genevieve Brandt (she/her) Isn't it this line in the dict file:
franbc@MacBook-Pro bwa-gatk % grep "HLA" Homo_sapiens_assembly38.dict
@SQ SN:HLA-A*01:01:01:01 LN:3503 M5:01cd0df602495b044b2c214d69a60aa2 AS:38 UR:/seq/references/Homo_sapiens_assembly38/v0/Homo_sapiens_assembly38.fasta SP:Homo sapiens
-
Ah! You are right! I missed adding a backslash while searching and that's why I didn't find that contig. I'll let you know when I have more information.
-
FranBC I'm sorry for not being able to make more progress but it looks like we will have to wait for a fix for this bug to be able to figure out what is going on.
I did also want to point out that when running ValidateVariants with your files I encountered multiple errors that you may want to resolve:
gatk --java-options "-Xmx4G" ValidateVariants -V HTS009.filtered.vcf
A USER ERROR has occurred: Input HTS009.filtered.vcf fails strict validation of type ALL: the AC tag has the incorrect number of records at position chr1:54406036, 2 vs. 1gatk --java-options "-Xmx4G" ValidateVariants -R hg38_v0_Homo_sapiens_assembly38.fasta -V HTS009.filtered.vcf --dbsnp GCF_000001405.39.dbSNP155.GRCh38p13_updatedict.vcf
A USER ERROR has occurred: Input HTS009.filtered.vcf fails strict validation of type ALL: the rsID rs55880629 for the record at position chr1:1704310 is not in dbSNPThank you for writing into the forum regarding this issue and I am sorry we do not have an easy fix!
-
Genevieve Brandt (she/her) No worries, thank you for flagging it as a potential bug!
Regarding the VCF errors - thank you for running ValidateVariants and highlighting them to me, much appreciated!
I have fixed my custom script which splits multiallelic variants in a VCF into separate variants (it correctly adjusted the sample genotypes and AD, but naively didn't adjust the AC,AN,AF,MLEAC,MLEAF and PL fields(!)) to more strictly follow the VCF specification and update those fields accordingly, which has fixed all the ValidateVariants errors - though for some reason I couldn't reproduce the error you got when including the dbsnp vcf file, on my local laptop and server it's running fine.
Thanks for all the amazing help, I'll wait for the fix.
Best regards,
Francesca
-
Great! Glad you didn't see the errors with the DBSNP errors on your end!
-
Hi!
I'm actually getting the same error (using gatk 4.2.1.0), only running with a gvcf file while setting INPUT_GVCF ture as below. While a lot of time is spent processing, no file is produced, and the tool fails with the same NullPointerException message:
```
$ gatk --java-options "-Xms2000m" CollectVariantCallingMetrics --INPUT mysample.g.vcf --OUTPUT mysample.g.vcf.metrics.txt --DBSNP GCF_000001405.39_ucsc-genbank-chrs.gz --SEQUENCE_DICTIONARY Homo_sapiens_assembly38.dict --THREAD_COUNT 8 --GVCF_INPUT true >> mysample.merge_gvcfs_gatk.manual.log 2>&1
$ tail -n13 mysample.merge_gvcfs_gatk.manual.log
INFO 2021-10-04 12:59:14 CollectVariantCallingMetrics Read 1,091,000,000 variants. Elapsed time: 01:20:07s. Time for last 100,000: 0s. Last read position: chr22_KB663609v1_alt:51,502
[Mon Oct 04 12:59:14 CEST 2021] picard.vcf.CollectVariantCallingMetrics done. Elapsed time: 80.18 minutes.
Runtime.totalMemory()=4211081216
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
java.lang.NullPointerException
at picard.util.DbSnpBitSetUtil.loadVcf(DbSnpBitSetUtil.java:163)
at picard.util.DbSnpBitSetUtil.createSnpAndIndelBitSets(DbSnpBitSetUtil.java:131)
at picard.vcf.CollectVariantCallingMetrics.doWork(CollectVariantCallingMetrics.java:101)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)#
```
The input file is valid though, and no error/warning arose when checked as per the suggestion here,
```
$ gatk --java-options "-Xmx4G" ValidateVariants -V mysample.g.vcf -R Homo_sapiens_assembly38.fasta --validation-type-to-exclude ALLELES
```
Is it possible that the INPUT_GVCF filter is not activated? Any kind of help is appreciated.
Thank you much.
Best,
Azza -
Hi Azza,
I think your error message is different than the one posted by FranBC so hopefully we can find a solution for you. Could you verify if your issue persists in the latest version of GATK, 4.2.2.0?
Thank you,
Genevieve
-
Hi Genevieve Brandt (she/her),
Thank you for the prompt response! Yes, I do confirm the issue is persistent in the latest gatk version:
```
[Wed Oct 06 18:56:06 CEST 2021] picard.vcf.CollectVariantCallingMetrics done. Elapsed time: 78.97 minutes.
Runtime.totalMemory()=3925868544
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
java.lang.NullPointerException
at picard.util.DbSnpBitSetUtil.loadVcf(DbSnpBitSetUtil.java:163)
at picard.util.DbSnpBitSetUtil.createSnpAndIndelBitSets(DbSnpBitSetUtil.java:131)
at picard.vcf.CollectVariantCallingMetrics.doWork(CollectVariantCallingMetrics.java:101)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Using GATK jar /home/azzaea/software/gatk/gatk-4.2.2.0/gatk-package-4.2.2.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms2000m -jar /home/azzaea/software/gatk/gatk-4.2.2.0/gatk-package-4.2.2.0-local.jar CollectVariantCallingMetrics --INPUT mysample.g.vcf --OUTPUT mysample.g.vcf.metrics.txt --DBSNP GCF_000001405.39_ucsc-genbank-chrs.gz --SEQUENCE_DICTIONARY Homo_sapiens_assembly38.dict --THREAD_COUNT 8 --GVCF_INPUT true```
-
Thanks for the update Azza Ahmed! It looks like the issue is with your dbSNP file and not the input VCF (java.lang.NullPointerException
at picard.util.DbSnpBitSetUtil.loadVcf(DbSnpBitSetUtil.java:163)).What format is your dbSNP file in? Is it a VCF?
You can try to solve this error first by validating the dbSNP file with the ValidateVariants command, then, if it is VCF format, you should re-name and re-index the file with .vcf.gz filename extension.
-
You are right, my dbsnp file is problematic. I'm a bit surprised since it worked fine in the BQSR stage upstream. I'm putting the logs from validation and obtaining the dbsnp file below, and I'd really appreciate any pointers as to how to work around this:
```
$ ~/software/gatk/gatk-4.2.2.0/gatk --java-options "-Xmx4G" ValidateVariants -V GCF_000001405.39_ucsc-genbank-chrs.gz -R Homo_sapiens_assembly38.fasta
:
[October 8, 2021 at 3:19:08 PM CEST] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 21.56 minutes.
Runtime.totalMemory()=1059061760
***********************************************************************A USER ERROR has occurred: Input GCF_000001405.39_ucsc-genbank-chrs.gz fails strict validation of type ALL: the REF allele is incorrect for the record at position chr5:47309449, fasta says N vs. VCF says T
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.$
$ zcat GCF_000001405.39_ucsc-genbank-chrs.gz | grep rs1580016009
chr5 47309449 rs1580016009 T A . . RS=1580016009;dbSNPBuildID=154;SSR=0;VC=SNV;GNO;FREQ=Korea1K:0.9677,0.03226$
```
For completion, here is how the dbsnp file was obtained:
```
# Download the global assembly report corresponding to the ref genome used
# Create the mapping schema: Refseq --> UCSC (or GenBank when nothing available)awk -v RS="(\r)?\n" 'BEGIN { FS="\t" } !/^#/ { if ($10 != "na") print $7,$10; else print $7,$5 }' GCF_000001405.39_GRCh38.p13_assembly_report.txt > dbSNP_RefSeq-to-UCSC-or-GenBank_GRCh38.p13.map
# Rename the chromosomes as per the desired schemabcftools annotate --rename-chrs dbSNP_RefSeq-to-UCSC-or-GenBank_GRCh38.p13.map -O z -o GCF_000001405.39_ucsc-genbank-chrs.gz GCF_000001405.39.gz
# Index the resulting vcfgatk IndexFeatureFile -I GCF_000001405.39_ucsc-genbank-chrs.gz```
Thank you much in advance,
Azza
-
Azza,
Since it worked with BQSR, it might work with CollectVariantCallingMetrics. First, I would recommend renaming the file to have the file extension .vcf.gz (and then make sure you re-create the index). The wrong file extension may be causing the issue with CollectVariantCallingMetrics.
Genevieve
-
Thank you Genevieve Brandt (she/her). I'm afraid changing the format does not really solve it. Below are my logs:
```
$ mv GCF_000001405.39_ucsc-genbank-chrs.gz GCF_000001405.39_ucsc-genbank-chrs.vcf.gz
$ gatk IndexFeatureFile -I GCF_000001405.39_ucsc-genbank-chrs.vcf.gz
$ /home/azzaea/software/gatk/gatk-4.2.2.0/gatk --java-options "-Xms2000m" CollectVariantCallingMetrics --INPUT mysample.g.vcf --OUTPUT mysample.metrics.txt --DBSNP GCF_000001405.39_ucsc-genbank-chrs.vcf.gz --SEQUENCE_DICTIONARY Homo_sapiens_assembly38.dict --THREAD_COUNT 8 --GVCF_INPUT true
:
INFO 2021-10-08 19:37:47 CollectVariantCallingMetrics Read 1,091,000,000 variants. Elapsed time: 01:07:16s. Time for last 100,000: 0s. Last read position: chr22_KB663609v1_alt:51,502
[Fri Oct 08 19:37:47 CEST 2021] picard.vcf.CollectVariantCallingMetrics done. Elapsed time: 67.31 minutes.
Runtime.totalMemory()=3519021056
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
java.lang.NullPointerException
at picard.util.DbSnpBitSetUtil.loadVcf(DbSnpBitSetUtil.java:163)
at picard.util.DbSnpBitSetUtil.createSnpAndIndelBitSets(DbSnpBitSetUtil.java:131)
at picard.vcf.CollectVariantCallingMetrics.doWork(CollectVariantCallingMetrics.java:101)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Using GATK jar /home/azzaea/software/gatk/gatk-4.2.2.0/gatk-package-4.2.2.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms2000m -jar /home/azzaea/software/gatk/gatk-4.2.2.0/gatk-package-4.2.2.0-local.jar CollectVariantCallingMetrics --INPUT mysampple.g.vcf --OUTPUT mysample.metrics.txt --DBSNP GCF_000001405.39_ucsc-genbank-chrs.vcf.gz --SEQUENCE_DICTIONARY Homo_sapiens_assembly38.dict --THREAD_COUNT 8 --GVCF_INPUT true```
-
I'm beginning to think it is actually something to do with the dbsnp file. I tried to use `CollectVariantCallingMetrics` on a vcf derived from the gvcf I used above (ie, after passing the gvcf file through GATK's : ` GenomicsDBImport`, `GenotypeGVCFs` (both per chromosome), then `GatherVcfs`). It still fails with a similar message, after spending almost an equal amount of time processing:
```
/home/azzaea/software/gatk/gatk-4.2.2.0/gatk --java-options -Xms2000m CollectVariantCallingMetrics --INPUT mysample.vcf --OUTPUT mysample.collectvcfstats.txt --DBSNP GCF_000001405.39_ucsc-genbank-chrs.vcf.gz --SEQUENCE_DICTIONARY Homo_sapiens_assembly38.dict --THREAD_COUNT 8
:
:
INFO 2021-10-12 10:51:53 CollectVariantCallingMetrics Read 1,091,000,000 variants. Elapsed time: 01:10:19s. Time for last 100,000: 0s. Last read position: chr22_KB663609v1_alt:51,502
[Tue Oct 12 10:51:53 CEST 2021] picard.vcf.CollectVariantCallingMetrics done. Elapsed time: 70.34 minutes.
Runtime.totalMemory()=2105540608
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
java.lang.NullPointerException
at picard.util.DbSnpBitSetUtil.loadVcf(DbSnpBitSetUtil.java:163)
at picard.util.DbSnpBitSetUtil.createSnpAndIndelBitSets(DbSnpBitSetUtil.java:131)
at picard.vcf.CollectVariantCallingMetrics.doWork(CollectVariantCallingMetrics.java:101)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)```
Is there a different way to massage the dbsnp file to pass the REF validation? (it passes standard adherence to vcf format, but fails the REF check- when using `ValidateVariants` as was previously shown) .
Thank you much,
Azza -
Hi Azza,
Thanks for trying those troubleshooting steps to see if the problem could be resolved. My colleague took a second look and identified that the program has a problem with the dbSNP file because there is a contig that is not in your provided reference. Unfortunately, the error message is not specific enough to tell which contig is causing the problem.
I have created an issue ticket on github so that the developers can add a better error message so that finding the problem is easier: https://github.com/broadinstitute/picard/issues/1732
For now, you can look more closely at your dbSNP file and resolve any issues where there are contigs not in your reference. I'm sorry there isn't an easier solution at this point, but we will use this case to continue to make GATK better!
Best,
Genevieve
-
Great, thank you!
I took another detour and realized it is possible to get statistics on the vcf from the `VariantsEval` command (with the same dbsnp file I have at hand now). I will leave it as such for now, but happy to try other workarounds with `CollectVariantCallingMetrics`
-
Great! Thanks for providing an update. Once this issue gets resolved, I'll let you know. It will most likely take some time as there are many high priority issues right now.
-
Hi. I believe that I have a similar issue because I am also getting the NullPointerException, but I have to admit that the script produces an output:
...
INFO 2022-04-29 20:49:44 CollectVariantCallingMetrics Read 59,800,000 variants. Elapsed time: 00:06:54s. Time for last 100,000: 0s. Last r
ead position: chrX:103,683,247
INFO 2022-04-29 20:49:45 CollectVariantCallingMetrics Read 59,900,000 variants. Elapsed time: 00:06:55s. Time for last 100,000: 0s. Last r
ead position: chrX:111,439,932
INFO 2022-04-29 20:49:45 CollectVariantCallingMetrics Read 60,000,000 variants. Elapsed time: 00:06:55s. Time for last 100,000: 0s. Last r
ead position: chrX:118,395,954
INFO 2022-04-29 20:49:46 CollectVariantCallingMetrics Read 60,100,000 variants. Elapsed time: 00:06:56s. Time for last 100,000: 0s. Last r
ead position: chrX:125,304,088
INFO 2022-04-29 20:49:46 CollectVariantCallingMetrics Read 60,200,000 variants. Elapsed time: 00:06:56s. Time for last 100,000: 0s. Last r
ead position: chrX:132,560,535
INFO 2022-04-29 20:49:47 CollectVariantCallingMetrics Read 60,300,000 variants. Elapsed time: 00:06:57s. Time for last 100,000: 0s. Last r
ead position: chrX:140,517,710
INFO 2022-04-29 20:49:47 CollectVariantCallingMetrics Read 60,400,000 variants. Elapsed time: 00:06:57s. Time for last 100,000: 0s. Last r
ead position: chrX:147,011,360
INFO 2022-04-29 20:49:48 CollectVariantCallingMetrics Read 60,500,000 variants. Elapsed time: 00:06:58s. Time for last 100,000: 0s. Last r
ead position: chrX:154,009,624
INFO 2022-04-29 20:49:49 CollectVariantCallingMetrics Read 60,600,000 variants. Elapsed time: 00:06:58s. Time for last 100,000: 0s. Last r
ead position: chrY:14,159,655
[Fri Apr 29 20:49:49 AEST 2022] picard.vcf.CollectVariantCallingMetrics done. Elapsed time: 7.00 minutes.
Runtime.totalMemory()=1580822528
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
java.lang.NullPointerException
at picard.util.DbSnpBitSetUtil.loadVcf(DbSnpBitSetUtil.java:163)
at picard.util.DbSnpBitSetUtil.createSnpAndIndelBitSets(DbSnpBitSetUtil.java:131)
at picard.vcf.CollectVariantCallingMetrics.doWork(CollectVariantCallingMetrics.java:101)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)Do you have an idea why the NullPointerExeption might be thrown in my case?
I do not get the error when I use the VCF files generated by the variant caller GATK and DeepVariant. The NullPointerExeption is only thrown when I use the VCF file generated by the variant caller HELLO.
That is how the header of the produced VCF file by the variant caller HELLO looks like. I also vcf-merged multiple samples into one VCF file, but I do not think that this causes the thrown error.
##fileformat=VCFv4.2
##contig=<ID=chr8,length=145138636>
##contig=<ID=chr1,length=248956422>
##contig=<ID=chr13,length=114364328>
##contig=<ID=chr21,length=46709983>
##contig=<ID=chr20,length=64444167>
##contig=<ID=chr18,length=80373285>
##contig=<ID=chr2,length=242193529>
##contig=<ID=chr14,length=107043718>
##contig=<ID=chrY,length=57227415>
##contig=<ID=chr4,length=190214555>
##contig=<ID=chr10,length=133797422>
##contig=<ID=chr22,length=50818468>
##contig=<ID=chrX,length=156040895>
##contig=<ID=chr11,length=135086622>
##contig=<ID=chr7,length=159345973>
##contig=<ID=chr17,length=83257441>
##contig=<ID=chr16,length=90338345>
##contig=<ID=chr15,length=101991189>
##contig=<ID=chr3,length=198295559>
##contig=<ID=chr19,length=58617616>
##contig=<ID=chr9,length=138394717>
##contig=<ID=chr5,length=181538259>
##contig=<ID=chr6,length=170805979>
##contig=<ID=chr12,length=133275309>
##INFO=<ID=MixtureOfExpertPrediction,Number=1,Type=String,Description="Mean predictions from experts">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FILTER=<ID=FAIL,Description="Failed call">
##INFO=<ID=SF,Number=.,Type=String,Description="Source File (index to sourceFiles, f when filtered)">
##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotypes">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE1 SAMPLE2 ...Help or feedback is much appreciated.
-
I have just replaced the configs of a working VCF file with the configs of the not working VCFfile. Now, a different error is thrown:
INFO 2022-04-29 22:15:45 CollectVariantCallingMetrics Read 3,400,000 variants. Elapsed time: 00:00:14s. Time for last 100,000: 0s. Last r
INFO 2022-04-29 22:15:46 CollectVariantCallingMetrics Read 3,500,000 variants. Elapsed time: 00:00:15s. Time for last 100,000: 0s. Last r
[rzei0002@m3-login1 compDeepVGATK]$ tail slurm-26135674_metrics_2.out
picard.vcf.processor.VariantAccumulatorExecutor$MultiThreadedChunkBased$MultiException: Children threads encountered exceptions:
The provided VCF file is malformed at approximately line number 1813791: unparsable vcf record with allele M, for input source: file:///fs02/xm41/robert/comp
at picard.vcf.processor.VariantAccumulatorExecutor$MultiThreadedChunkBased.awaitCompletion(VariantAccumulatorExecutor.java:109)
at picard.vcf.processor.VariantProcessor.process(VariantProcessor.java:94)
at picard.vcf.CollectVariantCallingMetrics.doWork(CollectVariantCallingMetrics.java:120)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:37)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)I also looked up why I am receiving the error:
chr1 248752514 . M C,A 77.67 PASS AC=95,1;AN=96;MixtureOfExpertPrediction=.;SF=0,1,2,3
There is a M in the REF column of my VCF file. The other two VCF files do not have that. Is there a fix for this issue? Should I remove this entry from my VCF file? There are only 8 entries with an M in the REF column, removing those should not be an issue. I just removed them and found some more strange entries. I will keep trying and get back to you.
-
I removed the strange entries and the tool is working now.
-
Hi Robert Zeibich,
Thanks for the update. The M is a symbolic allele and indicates an ambiguous base, an A or C. I don't think this type of symbolic allele can be handled by GATK, so I think that's why you had to remove those entries.
Please let me know if you have any other questions.
Best,
Genevieve
Please sign in to leave a comment.
30 comments