Funcotator Error, incompatible contigs
AnsweredI am trying to run Funcotator as follows and getting an error which I think means that my reference and source data is not compatible, yet both are hg38. Can you provide insight as to the issue? Thanks
! gatk Funcotator \
--variant ../1-3-Generate-Sample-Map-HG38_2021-09-07T19-59-00.vcf.gz \
--reference Homo_sapiens_assembly38.fasta \
--ref-version hg38 \
--data-sources-path funcotator_dataSources.v1.7.20200521g \
--output variants090721.funcotated.vcf \
--output-file-format VCF
Using GATK jar /etc/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /etc/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar Funcotator --variant ../1-3-Generate-Sample-Map-HG38_2021-09-07T19-59-00.vcf.gz --reference Homo_sapiens_assembly38.fasta --ref-version hg38 --data-sources-path ../funcotator_dataSources.v1.7.20200521g --output variants090721.funcotated.vcf --output-file-format VCF 23:35:41.840 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/etc/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so 23:35:42.034 INFO Funcotator - ------------------------------------------------------------ 23:35:42.034 INFO Funcotator - The Genome Analysis Toolkit (GATK) v4.2.0.0 23:35:42.034 INFO Funcotator - For support and documentation go to https://software.broadinstitute.org/gatk/ 23:35:42.035 INFO Funcotator - Executing as jupyter@006276791b8d on Linux v5.4.104+ amd64 23:35:42.035 INFO Funcotator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 23:35:42.035 INFO Funcotator - Start Date/Time: September 28, 2021 11:35:41 PM UTC 23:35:42.035 INFO Funcotator - ------------------------------------------------------------ 23:35:42.035 INFO Funcotator - ------------------------------------------------------------ 23:35:42.036 INFO Funcotator - HTSJDK Version: 2.24.0 23:35:42.036 INFO Funcotator - Picard Version: 2.25.0 23:35:42.036 INFO Funcotator - Built for Spark Version: 2.4.5 23:35:42.036 INFO Funcotator - HTSJDK Defaults.COMPRESSION_LEVEL : 2 23:35:42.036 INFO Funcotator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 23:35:42.036 INFO Funcotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 23:35:42.036 INFO Funcotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 23:35:42.037 INFO Funcotator - Deflater: IntelDeflater 23:35:42.037 INFO Funcotator - Inflater: IntelInflater 23:35:42.037 INFO Funcotator - GCS max retries/reopens: 20 23:35:42.037 INFO Funcotator - Requester pays: disabled 23:35:42.037 INFO Funcotator - Initializing engine 23:35:42.639 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jupyter/notebooks/PROACTIVE-WGS_JL%20correct%20billing%20acct/edit/../1-3-Generate-Sample-Map-HG38_2021-09-07T19-59-00.vcf.gz 23:35:43.149 INFO Funcotator - Done initializing engine 23:35:43.149 INFO Funcotator - Validating sequence dictionaries... 23:35:43.225 INFO Funcotator - Shutting down engine [September 28, 2021 11:35:43 PM UTC] org.broadinstitute.hellbender.tools.funcotator.Funcotator done. Elapsed time: 0.02 minutes. Runtime.totalMemory()=414711808 *********************************************************************** A USER ERROR has occurred: Input files Reference and Driving Variants have incompatible contigs: Dictionary Reference is missing contigs found in dictionary Driving Variants. Missing contigs:
-
Hi J LoPiccolo,
Could you run this command with the java option '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' and share the program log? The command line will look like this:
gatk --java-options "-DGATK_STACKTRACE_ON_USER_EXCEPTION=true" Funcotator [other tool program arguments]
You can find more information on java options in this article: https://gatk.broadinstitute.org/hc/en-us/articles/360035531892-GATK4-command-line-syntax
Best,
Genevieve
-
Sure, thanks. 2 questions:
1) Would i run this command by itself or with the previous (below)?
! gatk Funcotator \
--variant ../1-3-Generate-Sample-Map-HG38_2021-09-07T19-59-00.vcf.gz \
--reference Homo_sapiens_assembly38.fasta \
--ref-version hg38 \
--data-sources-path funcotator_dataSources.v1.7.20200521g \
--output variants090721.funcotated.vcf \
--output-file-format VCF2) For the command itself, am I running it with the words Funcotator [other tool program arguments] or without? Sorry, I don't have much coding background.
Thank you!
-
No problem! You would run your Funcotator command you have shared in 1) adding in the --java-options to the command. Here is what that would look like:
! gatk --java-options "-DGATK_STACKTRACE_ON_USER_EXCEPTION=true" Funcotator \
--variant ../1-3-Generate-Sample-Map-HG38_2021-09-07T19-59-00.vcf.gz \
--reference Homo_sapiens_assembly38.fasta \
--ref-version hg38 \
--data-sources-path funcotator_dataSources.v1.7.20200521g \
--output variants090721.funcotated.vcf \
--output-file-format VCF -
Thank you! It still gives me the same error, as below:
Using GATK jar /etc/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /etc/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar Funcotator --variant ../1-3-Generate-Sample-Map-HG38_2021-09-07T19-59-00.vcf.gz --reference Homo_sapiens_assembly38.fasta --ref-version hg38 --data-sources-path funcotator_dataSources.v1.7.20200521g --output variants090721.funcotated.vcf --output-file-format VCF 00:28:04.718 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/etc/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so 00:28:05.570 INFO Funcotator - ------------------------------------------------------------ 00:28:05.571 INFO Funcotator - The Genome Analysis Toolkit (GATK) v4.2.0.0 00:28:05.571 INFO Funcotator - For support and documentation go to https://software.broadinstitute.org/gatk/ 00:28:05.571 INFO Funcotator - Executing as jupyter@006276791b8d on Linux v5.4.104+ amd64 00:28:05.571 INFO Funcotator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_292-8u292-b10-0ubuntu1~18.04-b10 00:28:05.572 INFO Funcotator - Start Date/Time: October 1, 2021 12:28:04 AM UTC 00:28:05.572 INFO Funcotator - ------------------------------------------------------------ 00:28:05.572 INFO Funcotator - ------------------------------------------------------------ 00:28:05.573 INFO Funcotator - HTSJDK Version: 2.24.0 00:28:05.573 INFO Funcotator - Picard Version: 2.25.0 00:28:05.573 INFO Funcotator - Built for Spark Version: 2.4.5 00:28:05.573 INFO Funcotator - HTSJDK Defaults.COMPRESSION_LEVEL : 2 00:28:05.573 INFO Funcotator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 00:28:05.573 INFO Funcotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 00:28:05.573 INFO Funcotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 00:28:05.573 INFO Funcotator - Deflater: IntelDeflater 00:28:05.573 INFO Funcotator - Inflater: IntelInflater 00:28:05.573 INFO Funcotator - GCS max retries/reopens: 20 00:28:05.573 INFO Funcotator - Requester pays: disabled 00:28:05.573 INFO Funcotator - Initializing engine 00:28:06.346 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jupyter/notebooks/PROACTIVE-WGS_JL%20correct%20billing%20acct/edit/../1-3-Generate-Sample-Map-HG38_2021-09-07T19-59-00.vcf.gz 00:28:06.940 INFO Funcotator - Done initializing engine 00:28:06.941 INFO Funcotator - Validating sequence dictionaries... 00:28:07.103 INFO Funcotator - Shutting down engine [October 1, 2021 12:28:07 AM UTC] org.broadinstitute.hellbender.tools.funcotator.Funcotator done. Elapsed time: 0.05 minutes. Runtime.totalMemory()=411566080 *********************************************************************** A USER ERROR has occurred: Input files Reference and Driving Variants have incompatible contigs: Dictionary Reference is missing contigs found in dictionary Driving Variants. Missing contigs:
-
I am wondering if it has something to do with where I put my resource bundle data. I used gsutil to copy files into the notebook workspace, but I am not sure if I copied the right ones. Here they are listed below:
funcotator_dataSources.v1.7.20200521g funcotator_dataSources.v1.7.20200521g.dir.long.md5sum funcotator_dataSources.v1.7.20200521g.dir.md5sum funcotator_dataSources.v1.7.20200521g.sha256 funcotator_dataSources.v1.7.20200521g.tar.gz
-
Ah, I see, yes, it most likely has to do with an issue regarding how you have submitted the data sources. The --data-sources-path should be the data sources folder. Here is the usage example for Funcotator:
./gatk Funcotator \ -R reference.fasta \ -V input.vcf \ -O outputFile \ --output-file-format MAF \ --data-sources-path dataSourcesFolder/ \ --ref-version hg19
You can also find more information on the Funcotator page on how to set up your data sources folders in the section "Data Source Folders". https://gatk.broadinstitute.org/hc/en-us/articles/4405451170459-Funcotator
-
Thanks. So do I need to make a folder "dataSourcesFolder/" with the data sources and put it in my bucket? Or do I put the path to the public bucket with the sources? If possible, can you just walk me through the process? I am also not sure exactly which germline data sources the tool needs (in terms of files). Thanks!
-
Hi, sorry not sure if it helps you to take a look at this, but I think the issue is that my input VCF contains a bunch of viral contigs that are not present in the reference (see below). When I make a smaller input VCF with just a region of chromosome 1 (i.e., not containing the viral contigs) as input, I am able to run Funcotator successfully. So it is working on a region, but not the whole dataset. Have you seen this problem with viral contigs before?
A USER ERROR has occurred: Input files Reference and Driving Variants have incompatible contigs: Dictionary Reference is missing contigs found in dictionary Driving Variants. Missing contigs: CMV, HBV, HCV-1, HCV-2, HIV-1, HIV-2, KSHV, HTLV-1, MCV, SV40, HPV16, HPV18, HPV26, HPV31, HPV33, HPV35, HPV39, HPV45, HPV51, HPV52, HPV53, HPV56, HPV58, HPV59, HPV66, HPV68, HPV69, HPV73, HPV82, HPV1, HPV2, HPV3, HPV4, HPV5, HPV6, HPV7, HPV8, HPV9, HPV10, HPV11, HPV12, HPV13, HPV14, HPV15, HPV17, HPV19, HPV20, HPV21, HPV22, HPV23, HPV24, HPV25, HPV27, HPV28, HPV29, HPV30, HPV32, HPV34, HPV36, HPV37, HPV38, HPV40, HPV41, HPV42, HPV43, HPV44, HPV47, HPV48, HPV49, HPV50, HPV54, HPV57, HPV60, HPV61, HPV62, HPV63, HPV65, HPV67, HPV70, HPV71, HPV72, HPV74, HPV75, HPV76, HPV77, HPV78, HPV80, HPV81, HPV83, HPV84, HPV85, HPV86, HPV87, HPV88, HPV89, HPV90, HPV91, HPV92, HPV93, HPV94, HPV95, HPV96, HPV97, HPV98, HPV99, HPV100, HPV101, HPV102, HPV103, HPV104, HPV105, HPV106, HPV107, HPV108, HPV109, HPV110, HPV111, HPV112, HPV113, HPV114, HPV115, HPV116, HPV117, HPV118, HPV119, HPV120, HPV121, HPV122, HPV123, HPV124, HPV125, HPV126, HPV127, HPV128, HPV129, HPV130, HPV131, HPV132, HPV133, HPV134, HPV135, HPV136, HPV137, HPV138, HPV139, HPV140, HPV141, HPV142, HPV143, HPV144, HPV145, HPV146, HPV147, HPV148, HPV149, HPV150, HPV151, HPV152, HPV153, HPV154, HPV155, HPV156, HPV159, HPV160, HPV161, HPV162, HPV163, HPV164, HPV165, HPV166, HPV167, HPV168, HPV169, HPV170, HPV171, HPV172, HPV173, HPV174, HPV175, HPV178, HPV179, HPV180, HPV184, HPV197, HPV199, HPV-mCG2, HPV-mCG3, HPV-mCH2, HPV-mFD1, HPV-mFD2, HPV-mFS1, HPV-mFi864, HPV-mKC5, HPV-mKN1, HPV-mKN2, HPV-mKN3, HPV-mL55, HPV-mRTRX7, HPV-mSD2
-
What tool did you get this USER ERROR on? Could you share your command line and entire program log?
-
! gatk Funcotator \
--variant ../1-3-Generate-Sample-Map-HG38_2021-09-07T19-59-00.vcf.gz \
--reference gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta \
--ref-version hg38 \
--data-sources-path funcotator_dataSources.v1.7.20200521g/ \
--output variants090721.funcotated.vcf \
--output-file-format VCF
Using GATK jar /etc/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jarRunning: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /etc/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar Funcotator --variant ../1-3-Generate-Sample-Map-HG38_2021-09-07T19-59-00.vcf.gz --reference gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta --ref-version hg38 --data-sources-path funcotator_dataSources.v1.7.20200521z/ --output variants090721.funcotated.vcf --output-file-format VCF -L chr1:1-24895642218:14:05.596 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/etc/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so18:14:05.793 INFO Funcotator - ------------------------------------------------------------18:14:05.793 INFO Funcotator - The Genome Analysis Toolkit (GATK) v4.2.0.018:14:05.794 INFO Funcotator - For support and documentation go to https://software.broadinstitute.org/gatk/18:14:05.794 INFO Funcotator - Executing as jupyter@006276791b8d on Linux v5.4.104+ amd6418:14:05.794 INFO Funcotator - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_292-8u292-b10-0ubuntu1~18.04-b1018:14:05.794 INFO Funcotator - Start Date/Time: October 1, 2021 6:14:05 PM UTC18:14:05.794 INFO Funcotator - ------------------------------------------------------------18:14:05.794 INFO Funcotator - ------------------------------------------------------------18:14:05.795 INFO Funcotator - HTSJDK Version: 2.24.018:14:05.795 INFO Funcotator - Picard Version: 2.25.018:14:05.795 INFO Funcotator - Built for Spark Version: 2.4.518:14:05.795 INFO Funcotator - HTSJDK Defaults.COMPRESSION_LEVEL : 218:14:05.795 INFO Funcotator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false18:14:05.795 INFO Funcotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true18:14:05.795 INFO Funcotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false18:14:05.796 INFO Funcotator - Deflater: IntelDeflater18:14:05.796 INFO Funcotator - Inflater: IntelInflater18:14:05.796 INFO Funcotator - GCS max retries/reopens: 2018:14:05.796 INFO Funcotator - Requester pays: disabled18:14:05.796 INFO Funcotator - Initializing engine18:14:07.427 INFO FeatureManager - Using codec VCFCodec to read file file:///home/jupyter/notebooks/PROACTIVE-WGS_JL%20correct%20billing%20acct/edit/../1-3-Generate-Sample-Map-HG38_2021-09-07T19-59-00.vcf.gz18:14:07.742 INFO IntervalArgumentCollection - Processing 248956422 bp from intervals18:14:07.799 INFO Funcotator - Done initializing engine18:14:07.799 INFO Funcotator - Validating sequence dictionaries...18:14:07.851 INFO Funcotator - Shutting down engine[October 1, 2021 6:14:07 PM UTC] org.broadinstitute.hellbender.tools.funcotator.Funcotator done. Elapsed time: 0.04 minutes.Runtime.totalMemory()=401080320************************************************************
*********** A USER ERROR has occurred: Input files Reference and Driving Variants have incompatible contigs: Dictionary Reference is missing contigs found in dictionary Driving Variants. Missing contigs: CMV, HBV, HCV-1, HCV-2, HIV-1, HIV-2, KSHV, HTLV-1, MCV, SV40, HPV16, HPV18, HPV26, HPV31, HPV33, HPV35, HPV39, HPV45, HPV51, HPV52, HPV53, HPV56, HPV58, HPV59, HPV66, HPV68, HPV69, HPV73, HPV82, HPV1, HPV2, HPV3, HPV4, HPV5, HPV6, HPV7, HPV8, HPV9, HPV10, HPV11, HPV12, HPV13, HPV14, HPV15, HPV17, HPV19, HPV20, HPV21, HPV22, HPV23, HPV24, HPV25, HPV27, HPV28, HPV29, HPV30, HPV32, HPV34, HPV36, HPV37, HPV38, HPV40, HPV41, HPV42, HPV43, HPV44, HPV47, HPV48, HPV49, HPV50, HPV54, HPV57, HPV60, HPV61, HPV62, HPV63, HPV65, HPV67, HPV70, HPV71, HPV72, HPV74, HPV75, HPV76, HPV77, HPV78, HPV80, HPV81, HPV83, HPV84, HPV85, HPV86, HPV87, HPV88, HPV89, HPV90, HPV91, HPV92, HPV93, HPV94, HPV95, HPV96, HPV97, HPV98, HPV99, HPV100, HPV101, HPV102, HPV103, HPV104, HPV105, HPV106, HPV107, HPV108, HPV109, HPV110, HPV111, HPV112, HPV113, HPV114, HPV115, HPV116, HPV117, HPV118, HPV119, HPV120, HPV121, HPV122, HPV123, HPV124, HPV125, HPV126, HPV127, HPV128, HPV129, HPV130, HPV131, HPV132, HPV133, HPV134, HPV135, HPV136, HPV137, HPV138, HPV139, HPV140, HPV141, HPV142, HPV143, HPV144, HPV145, HPV146, HPV147, HPV148, HPV149, HPV150, HPV151, HPV152, HPV153, HPV154, HPV155, HPV156, HPV159, HPV160, HPV161, HPV162, HPV163, HPV164, HPV165, HPV166, HPV167, HPV168, HPV169, HPV170, HPV171, HPV172, HPV173, HPV174, HPV175, HPV178, HPV179, HPV180, HPV184, HPV197, HPV199, HPV-mCG2, HPV-mCG3, HPV-mCH2, HPV-mFD1, HPV-mFD2, HPV-mFS1, HPV-mFi864, HPV-mKC5, HPV-mKN1, HPV-mKN2, HPV-mKN3, HPV-mL55, HPV-mRTRX7, HPV-mSD2 .
Reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM, chr1_KI270706v1_random, chr1_KI270707v1_random, chr1_KI270708v1_random, chr1_KI270709v1_random, chr1_KI270710v1_random, chr1_KI270711v1_random, chr1_KI270712v1_random, chr1_KI270713v1_random, chr1_KI270714v1_random, chr2_KI270715v1_random, chr2_KI270716v1_random, chr3_GL000221v1_random, chr4_GL000008v2_random, chr5_GL000208v1_random, chr9_KI270717v1_random, chr9_KI270718v1_random, chr9_KI270719v1_random, chr9_KI270720v1_random, chr11_KI270721v1_random, chr14_GL000009v2_random, chr14_GL000225v1_random, chr14_KI270722v1_random, chr14_GL000194v1_random, chr14_KI270723v1_random, chr14_KI270724v1_random, chr14_KI270725v1_random, chr14_KI270726v1_random, chr15_KI270727v1_random, chr16_KI270728v1_random, chr17_GL000205v2_random, chr17_KI270729v1_random, chr17_KI270730v1_random, chr22_KI270731v1_random, chr22_KI270732v1_random, chr22_KI270733v1_random, chr22_KI270734v1_random, chr22_KI270735v1_random, chr22_KI270736v1_random, chr22_KI270737v1_random, chr22_KI270738v1_random, chr22_KI270739v1_random, chrY_KI270740v1_random, chrUn_KI270302v1, chrUn_KI270304v1, chrUn_KI270303v1, chrUn_KI270305v1, chrUn_KI270322v1, chrUn_KI270320v1, chrUn_KI270310v1, chrUn_KI270316v1, chrUn_KI270315v1, chrUn_KI270312v1, chrUn_KI270311v1, chrUn_KI270317v1, chrUn_KI270412v1, chrUn_KI270411v1, chrUn_KI270414v1, chrUn_KI270419v1, chrUn_KI270418v1, chrUn_KI270420v1, chrUn_KI270424v1, chrUn_KI270417v1, chrUn_KI270422v1, chrUn_KI270423v1, chrUn_KI270425v1, chrUn_KI270429v1, chrUn_KI270442v1, chrUn_KI270466v1, chrUn_KI270465v1, chrUn_KI270467v1, chrUn_KI270435v1, chrUn_KI270438v1, chrUn_KI270468v1, chrUn_KI270510v1, chrUn_KI270509v1, chrUn_KI270518v1, chrUn_KI270508v1, chrUn_KI270516v1, chrUn_KI270512v1, chrUn_KI270519v1, chrUn_KI270522v1, chrUn_KI270511v1, chrUn_KI270515v1, chrUn_KI270507v1, chrUn_KI270517v1, chrUn_KI270529v1, chrUn_KI270528v1, chrUn_KI270530v1, chrUn_KI270539v1, chrUn_KI270538v1, chrUn_KI270544v
….
-
I have used this reference file several times with other tools and it works. I think it is something with the data sources. Is it possible that the data sources files are not using the same reference (hg38)?
-
I just confirmed with a colleague on what the driving variants are with Funcotator and they are the variants in your VCF input.
So, the issue you are getting is that your VCF file (1-3-Generate-Sample-Map-HG38_2021-09-07T19-59-00.vcf.gz) and reference file do not match contigs. With Funcotator, you should use the same reference version used when you were creating the VCF file.
-
hi thanks, I have used that reference file. And I am able to get other tools (for example SelectVariants and Concordance) to work using the same exact input VCF (1-3-Generate-Sample-Map-HG38_2021-09-07T19-59-00.vcf.gz) and Hg38 reference VCF (gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta) as I am using here.
Is there a reason why there would be viral contigs coming up in the input VCF (my VCF) that would not be in the Reference VCF, and why this would be an issue for Funcotator and not other gatk tools?
-
How did the viral contigs get into your VCF if the VCF was created with HG38? Could you explain how you created the VCF file?
There shouldn't be viral contigs in your VCF that are not in the reference you used. Most likely the issue didn't come up with the other tools because they were not checking for this issue and it wasn't a problem for those tools.
-
Hi yes thanks! To make my VCF I used the GATK best practices pipeline- so Haplotype Caller (1-2) and then Joint Genotyping (1-4) with VQSR filtering. Using the Hg38 reference genome provided by GATK.
-
I also went back to my VCF (using the Select Variants tool) and searched for the viral contigs (e.g., CMV, etc), which produced no variants. These are the only contigs in my VCF:
1 chrEBV
1 chrM
2 chrY
4 chrX
5 chr10
5 chr20
7 chr13
8 chr16
8 chr21
10 chr7
10 chr9
11 chr15
11 chr18
13 chr14
13 chr4
14 chr11
14 chr12
15 chr5
17 chr6
17 chr8
18 chr2
18 chr3
19 chr22
22 chr1
22 chr17
44 chr19
606 chrUn
-
Somehow it seems that those viral contigs got into your VCF dictionary. Could you update the VCF dictionary of your VCF file with the tool UpdateVCFSequenceDictionary?
-
Sure - is possible for you to provide me with exact command line for that tool? thank you!!
-
The thing is that the VCF header contains the VCF dictionary, and these contigs are not in my VCF (nor in the VCF header/dictionary). I have looked several times :)
-
Hmmm, thanks for the extra info! I can't figure out why you would be getting this error message then. Could you share the VCF header contig lines?
You can print the VCF header with the following command:
bcftools view -h file.vcf
Then, just paste the lines describing the contigs.
-
It's giving me an error saying that the comment is too long after I paste it in, I can email to you if easier. THanks
-
Hi J LoPiccolo,
We haven't been able to figure out what is the issue. Could you submit your VCF file as a bug report following these instructions?
Let me know when you have uploaded the files and I will continue troubleshooting.
Best,
Genevieve
-
Hi, I am going to upload the bug report but I am afraid it's not possible to narrow down where in the VCF the problem is happening, because when I use SelectVariants to make piecemeal VCFs, the tool is able to run (but not on the whole VCF, which is the issue). Is it ok if I upload a SitesOnlyVCF (which is ~4 GB) for you guys to work with? Thanks
-
J LoPiccolo if the sites only VCF recreates the issue, then yes, go ahead and upload that file.
-
Ok- if it does not, my entire VCF is available in my workspace/Google bucket. Are you guys able to access that if I share the workspace and notebook with you?
-
Actually it does reproduce the error. The issue I am having now is that http://ftp.broadinstitute.org/ is not loading for me to upload the files.
-
Here are instructions for how to upload the file to an FTP from the command line: https://tecadmin.net/download-upload-files-using-ftp-command-line/
-
Ok- I do not have Broad access so I will have to get that first. Is there a way for me to upload using Google cloud? That would be easier, thanks.
-
The username and password for access are given in the document I shared above: How do I submit a bug report?
We don't have a google bucket set up for this purpose.
-
Thanks, I am able to access the server but not upload any files to it.
Please sign in to leave a comment.
35 comments