Funocator chromosome missmatch to somatic variant calling pipeline
REQUIRED for all errors and issues:
a) GATK version used:
v4.5.0.0
b) Exact command used:
gatk Funcotator -variant ../out/Filter_mutect/G1_2nM_BL_raw_variants.vcf \
--reference /export/home/bin/genomes/BWA_hg38/Homo_sapiens_assembly38.fasta \
--ref-version hg38 \
--data-sources-path /export/home/bin/variant_calling/funcotator_dataSources.v1.8.hg38.20230908s \
--output ../out/Annotate_variants/G1_2nM_BL_raw_variants.vcf \
--output-file-format VCF
16:21:32.762 INFO FuncotatorEngine - Using given VCF and Reference. No conversion required.
16:21:32.764 INFO FuncotatorUtils - Input VCF has been determined to not based on b37:
16:21:32.764 INFO FuncotatorUtils - The following contigs are present in b37 and missing in the input VCF sequence dictionary:
16:21:32.767 INFO FuncotatorUtils - 1 (len=249250621,assembly=GRCh37)
16:21:32.767 INFO FuncotatorUtils - 2 (len=243199373,assembly=GRCh37)
16:21:32.767 INFO FuncotatorUtils - 3 (len=198022430,assembly=GRCh37)
16:21:32.767 INFO FuncotatorUtils - 4 (len=191154276,assembly=GRCh37)
16:21:32.767 INFO FuncotatorUtils - 5 (len=180915260,assembly=GRCh37)
16:21:32.768 INFO FuncotatorUtils - 6 (len=171115067,assembly=GRCh37)
16:21:32.768 INFO FuncotatorUtils - 7 (len=159138663,assembly=GRCh37)
16:21:32.768 INFO FuncotatorUtils - 8 (len=146364022,assembly=GRCh37)
16:21:32.768 INFO FuncotatorUtils - 9 (len=141213431,assembly=GRCh37)
16:21:32.768 INFO FuncotatorUtils - 10 (len=135534747,assembly=GRCh37)
16:21:32.768 INFO FuncotatorUtils - 11 (len=135006516,assembly=GRCh37)
16:21:32.768 INFO FuncotatorUtils - 12 (len=133851895,assembly=GRCh37)
16:21:32.768 INFO FuncotatorUtils - 13 (len=115169878,assembly=GRCh37)
16:21:32.768 INFO FuncotatorUtils - 14 (len=107349540,assembly=GRCh37)
16:21:32.768 INFO FuncotatorUtils - 15 (len=102531392,assembly=GRCh37)
16:21:32.768 INFO FuncotatorUtils - 16 (len=90354753,assembly=GRCh37)
16:21:32.768 INFO FuncotatorUtils - 17 (len=81195210,assembly=GRCh37)
16:21:32.768 INFO FuncotatorUtils - 18 (len=78077248,assembly=GRCh37)
16:21:32.768 INFO FuncotatorUtils - 19 (len=59128983,assembly=GRCh37)
16:21:32.768 INFO FuncotatorUtils - 20 (len=63025520,assembly=GRCh37)
16:21:32.769 INFO FuncotatorUtils - 21 (len=48129895,assembly=GRCh37)
16:21:32.769 INFO FuncotatorUtils - 22 (len=51304566,assembly=GRCh37)
16:21:32.769 INFO FuncotatorUtils - X (len=155270560,assembly=GRCh37)
16:21:32.769 INFO FuncotatorUtils - Y (len=59373566,assembly=GRCh37)
16:21:32.769 INFO FuncotatorUtils - MT (len=16569,assembly=GRCh37)
16:21:32.769 INFO FuncotatorUtils - GL000207.1 (len=4262,assembly=GRCh37)
16:21:32.769 INFO FuncotatorUtils - GL000226.1 (len=15008,assembly=GRCh37)
16:21:32.769 INFO FuncotatorUtils - GL000229.1 (len=19913,assembly=GRCh37)
16:21:32.769 INFO FuncotatorUtils - GL000231.1 (len=27386,assembly=GRCh37)
16:21:32.769 INFO FuncotatorUtils - GL000210.1 (len=27682,assembly=GRCh37)
16:21:32.769 INFO FuncotatorUtils - GL000239.1 (len=33824,assembly=GRCh37)
16:21:32.769 INFO FuncotatorUtils - GL000235.1 (len=34474,assembly=GRCh37)
16:21:32.769 INFO FuncotatorUtils - GL000201.1 (len=36148,assembly=GRCh37)
16:21:32.769 INFO FuncotatorUtils - GL000247.1 (len=36422,assembly=GRCh37)
16:21:32.769 INFO FuncotatorUtils - GL000245.1 (len=36651,assembly=GRCh37)
16:21:32.769 INFO FuncotatorUtils - GL000197.1 (len=37175,assembly=GRCh37)
16:21:32.770 INFO FuncotatorUtils - GL000203.1 (len=37498,assembly=GRCh37)
16:21:32.770 INFO FuncotatorUtils - GL000246.1 (len=38154,assembly=GRCh37)
16:21:32.770 INFO FuncotatorUtils - GL000249.1 (len=38502,assembly=GRCh37)
16:21:32.770 INFO FuncotatorUtils - GL000196.1 (len=38914,assembly=GRCh37)
16:21:32.770 INFO FuncotatorUtils - GL000248.1 (len=39786,assembly=GRCh37)
16:21:32.770 INFO FuncotatorUtils - GL000244.1 (len=39929,assembly=GRCh37)
16:21:32.770 INFO FuncotatorUtils - GL000238.1 (len=39939,assembly=GRCh37)
16:21:32.770 INFO FuncotatorUtils - GL000202.1 (len=40103,assembly=GRCh37)
16:21:32.770 INFO FuncotatorUtils - GL000234.1 (len=40531,assembly=GRCh37)
16:21:32.770 INFO FuncotatorUtils - GL000232.1 (len=40652,assembly=GRCh37)
16:21:32.770 INFO FuncotatorUtils - GL000206.1 (len=41001,assembly=GRCh37)
16:21:32.770 INFO FuncotatorUtils - GL000240.1 (len=41933,assembly=GRCh37)
16:21:32.770 INFO FuncotatorUtils - GL000236.1 (len=41934,assembly=GRCh37)
16:21:32.770 INFO FuncotatorUtils - GL000241.1 (len=42152,assembly=GRCh37)
16:21:32.771 INFO FuncotatorUtils - GL000243.1 (len=43341,assembly=GRCh37)
16:21:32.771 INFO FuncotatorUtils - GL000242.1 (len=43523,assembly=GRCh37)
16:21:32.771 INFO FuncotatorUtils - GL000230.1 (len=43691,assembly=GRCh37)
16:21:32.771 INFO FuncotatorUtils - GL000237.1 (len=45867,assembly=GRCh37)
16:21:32.771 INFO FuncotatorUtils - GL000233.1 (len=45941,assembly=GRCh37)
16:21:32.771 INFO FuncotatorUtils - GL000204.1 (len=81310,assembly=GRCh37)
16:21:32.771 INFO FuncotatorUtils - GL000198.1 (len=90085,assembly=GRCh37)
16:21:32.771 INFO FuncotatorUtils - GL000208.1 (len=92689,assembly=GRCh37)
16:21:32.771 INFO FuncotatorUtils - GL000191.1 (len=106433,assembly=GRCh37)
16:21:32.771 INFO FuncotatorUtils - GL000227.1 (len=128374,assembly=GRCh37)
16:21:32.771 INFO FuncotatorUtils - GL000228.1 (len=129120,assembly=GRCh37)
16:21:32.771 INFO FuncotatorUtils - GL000214.1 (len=137718,assembly=GRCh37)
16:21:32.771 INFO FuncotatorUtils - GL000221.1 (len=155397,assembly=GRCh37)
16:21:32.771 INFO FuncotatorUtils - GL000209.1 (len=159169,assembly=GRCh37)
16:21:32.772 INFO FuncotatorUtils - GL000218.1 (len=161147,assembly=GRCh37)
16:21:32.772 INFO FuncotatorUtils - GL000220.1 (len=161802,assembly=GRCh37)
16:21:32.772 INFO FuncotatorUtils - GL000213.1 (len=164239,assembly=GRCh37)
16:21:32.772 INFO FuncotatorUtils - GL000211.1 (len=166566,assembly=GRCh37)
16:21:32.772 INFO FuncotatorUtils - GL000199.1 (len=169874,assembly=GRCh37)
16:21:32.772 INFO FuncotatorUtils - GL000217.1 (len=172149,assembly=GRCh37)
16:21:32.772 INFO FuncotatorUtils - GL000216.1 (len=172294,assembly=GRCh37)
16:21:32.772 INFO FuncotatorUtils - GL000215.1 (len=172545,assembly=GRCh37)
16:21:32.772 INFO FuncotatorUtils - GL000205.1 (len=174588,assembly=GRCh37)
16:21:32.772 INFO FuncotatorUtils - GL000219.1 (len=179198,assembly=GRCh37)
16:21:32.772 INFO FuncotatorUtils - GL000224.1 (len=179693,assembly=GRCh37)
16:21:32.772 INFO FuncotatorUtils - GL000223.1 (len=180455,assembly=GRCh37)
16:21:32.772 INFO FuncotatorUtils - GL000195.1 (len=182896,assembly=GRCh37)
16:21:32.773 INFO FuncotatorUtils - GL000212.1 (len=186858,assembly=GRCh37)
16:21:32.773 INFO FuncotatorUtils - GL000222.1 (len=186861,assembly=GRCh37)
16:21:32.773 INFO FuncotatorUtils - GL000200.1 (len=187035,assembly=GRCh37)
16:21:32.773 INFO FuncotatorUtils - GL000193.1 (len=189789,assembly=GRCh37)
16:21:32.773 INFO FuncotatorUtils - GL000194.1 (len=191469,assembly=GRCh37)
16:21:32.773 INFO FuncotatorUtils - GL000225.1 (len=211173,assembly=GRCh37)
16:21:32.773 INFO FuncotatorUtils - GL000192.1 (len=547496,assembly=GRCh37)
16:21:32.773 INFO FuncotatorUtils - NC_007605 (len=171823,assembly=NC_007605.1)
16:21:32.774 INFO Funcotator - Creating a VCF file for output: file:/export/bioinfpog2/Mhairi/Hager/Analysis/DNA/hg38/bin/../out/Annotate_variants/G1_2nM_BL_raw_variants.vcf
16:21:32.825 INFO ProgressMeter - Starting traversal
16:21:32.826 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
-
Looking into this further, its more confusing. Both my variants and the data stored in funcotator_dataSources.v1.8.hg38.20230908s have the chr notation. Why is it searching for the non chr format?
-
Hi Thomas Stevens,
The message about "The following contigs are present in b37 and missing in the input VCF sequence dictionary" is just Funcotator's confusing way of telling you that your input VCF does not appear to have a b37/GRCh37 sequence dictionary. The list of contigs that it prints (1, 2, 3, etc.) are just the b37/GRCh37 contigs that are not present in your VCF's hg38 sequence dictionary.
You say that you've confirmed that both your VCF and the Funcotator datasources are using the chr contig naming convention. Could you also confirm that your reference (Homo_sapiens_assembly38.fasta) has the same chr prefix in its contig names?
How long did you leave Funcotator running? Does the traversal eventually finish (after several hours, for example)? Does the GATK process ever abort/exit?Regards,
David
-
Hi David,
The reference does have the chr prefix as well.
I left Funcotator running for around 5 days and it never progresses. GATK also never aborts it just stays stuck right at the beginning.
From experimenting I found that by deleting the other databases in the Funcotator folder to just keep dbsnp and the folders required by funcotator it runs to the end in ~ 20mins so its either an issue with it trying to use all the databases or something within one of them. I'm not sure how to fix it to be able to use more of the resources.
Regards,
Tommy
-
Hi Thomas Stevens,
Do you have the gnomAD datasource activated (see https://gatk.broadinstitute.org/hc/en-us/articles/360035889931-Funcotator-Information-and-Tutorial#1.1.2.2)? The gnomAD datasource is very large and involves remote network accesses, and so can slow down Funcotator quite a bit on a slow network connection.
Regards,
David
Please sign in to leave a comment.
4 comments