Hi, i am beginner in bioinformatics and need help in how i can index the <fa ref gene file> to accept it , i tried samtools faidx to make .fai file but It didn't do anything about the problem.
REQUIRED for all errors and issues:
a) GATK version used: 4
b) Exact command used: gatk VariantFiltration -R Oryza_sativa.IRGSP-1.0.dna_sm.toplevel.fa -V ./snps.vcf -O ./filtered_snps.vcf -filter-name "QD_filter" -filter "QD < 2.0" -filter-name "FS_filter" -filter "FS > 60.0" -filter-name "MQ_filter" -filter "MQ < 40.0" -filter-name "SOR_filter" -filter "SOR > 4.0" -filter-name "MQRankSum_filter" -filter "MQRankSum < -12.5" -filter-name "ReadPosRankSum_filter" -filter "ReadPosRankSum < -8.0" -genotype-filter-expression "DP < 10" -genotype-filter-name "DP_filter" -genotype-filter-expression "GQ < 10" -genotype-filter-name "GQ_filter"
c) Entire program log:
Using GATK jar /home/pgr-master/anaconda3/share/gatk4-4.3.0.0-0/gatk-package-4.3.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/pgr-master/anaconda3/share/gatk4-4.3.0.0-0/gatk-package-4.3.0.0-local.jar VariantFiltration -R Oryza_sativa.IRGSP-1.0.dna_sm.toplevel.fa -V ./snps.vcf -O ./filtered_snps.vcf -filter-name QD_filter -filter QD < 2.0 -filter-name FS_filter -filter FS > 60.0 -filter-name MQ_filter -filter MQ < 40.0 -filter-name SOR_filter -filter SOR > 4.0 -filter-name MQRankSum_filter -filter MQRankSum < -12.5 -filter-name ReadPosRankSum_filter -filter ReadPosRankSum < -8.0 -genotype-filter-expression DP < 10 -genotype-filter-name DP_filter -genotype-filter-expression GQ < 10 -genotype-filter-name GQ_filter
10:39:53.645 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/pgr-master/anaconda3/share/gatk4-4.3.0.0-0/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
10:39:53.982 INFO VariantFiltration - ------------------------------------------------------------
10:39:53.982 INFO VariantFiltration - The Genome Analysis Toolkit (GATK) v4.3.0.0
10:39:53.982 INFO VariantFiltration - For support and documentation go to https://software.broadinstitute.org/gatk/
10:39:53.983 INFO VariantFiltration - Executing as pgr-master@pgrmaster-server on Linux v5.19.0-35-generic amd64
10:39:53.983 INFO VariantFiltration - Java runtime: OpenJDK 64-Bit Server VM v11.0.13+7-b1751.21
10:39:53.983 INFO VariantFiltration - Start Date/Time: March 26, 2023 at 10:39:53 AM EET
10:39:53.983 INFO VariantFiltration - ------------------------------------------------------------
10:39:53.983 INFO VariantFiltration - ------------------------------------------------------------
10:39:53.984 INFO VariantFiltration - HTSJDK Version: 3.0.1
10:39:53.984 INFO VariantFiltration - Picard Version: 2.27.5
10:39:53.984 INFO VariantFiltration - Built for Spark Version: 2.4.5
10:39:53.984 INFO VariantFiltration - HTSJDK Defaults.COMPRESSION_LEVEL : 2
10:39:53.985 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:39:53.985 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:39:53.985 INFO VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:39:53.985 INFO VariantFiltration - Deflater: IntelDeflater
10:39:53.985 INFO VariantFiltration - Inflater: IntelInflater
10:39:53.985 INFO VariantFiltration - GCS max retries/reopens: 20
10:39:53.985 INFO VariantFiltration - Requester pays: disabled
10:39:53.985 INFO VariantFiltration - Initializing engine
10:39:53.989 INFO VariantFiltration - Shutting down engine
[March 26, 2023 at 10:39:53 AM EET] org.broadinstitute.hellbender.tools.walkers.filters.VariantFiltration done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=2147483648
***********************************************************************
A USER ERROR has occurred: Fasta dict file file:///home/pgr-master/Downloads/NCBI/accession%2028%20mapping/annotation/Oryza_sativa.IRGSP-1.0.dna_sm.toplevel.dict for reference file:///home/pgr-master/Downloads/NCBI/accession%2028%20mapping/annotation/Oryza_sativa.IRGSP-1.0.dna_sm.toplevel.fa does not exist. Please see http://gatkforums.broadinstitute.org/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference for help creating it.
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
-
Hello Mina Ashraf. Typically to use a reference .fasta file you need two accompanying side inputs, a .fai index file to support random file access (which it sounds like you made by running samtools faidx), and a .dict file that lists what contigs are present and how long they are. The simplest solution I would recommend for you is to regenerate the .dict file using the picard tool `CreateSequenceDictionary` which you can read about here: https://gatk.broadinstitute.org/hc/en-us/articles/360041415732-CreateSequenceDictionary-Picard-.
Please sign in to leave a comment.
1 comment