Picard Liftover not using indexed reference genome fasta file
Hey everybody,
- Picard LiftoverVCF tool doesnt seem to use "idx" indexed fasta files for the liftover
Usecase:
- As the picard liftover tool looks to be the most "complete" out there, i want to wrap it into an API and make it available for "quick" lifting in case its needed
- currently it will run "1 variant liftover" in 6+ seconds
- Taking a glance at the source code a very simple code change would enable us to try to find the index file if available and use it, which would more than double the performance, GATK already supports using indexed fasta files and its already in the GATK sourcecode, its just not setup in the Liftover code
I can create a PR to do this small fix, i built and tested the change, looks to be working perfectly
But, since this is my first time (trying to) contribute to an open source project, i need some help as to how to do it? :D
REQUIRED for all errors and issues:
a) GATK version used: 3.1.1
b) Exact command used:
java -jar picard.jar LiftoverVcf I=input.vcf O=output.vcf CHAIN=hg19ToHg38.over.chain R=ref_GRCh38_Homo_sapiens_assembly38.fasta REJECT=rejected.vcf RECOVER_SWAPPED_REF_ALT=true
c) Entire program log:
java -jar picard.jar LiftoverVcf I=input.vcf O=output.vcf CHAIN=hg19ToHg38.over.chain R=ref_GRCh38_Homo_sapiens_assembly38.fasta REJECT=rejected.vcf RECOVER_SWAPPED_REF_ALT=true
INFO 2024-02-20 13:47:55 LiftoverVcf********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
**********
https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
********** LiftoverVcf -I input.vcf -O output.vcf -CHAIN hg19ToHg38.over.chain -R ref_GRCh38_Homo_sapiens_assembly38.fasta -REJECT rejected.vcf -RECOVER_SWAPPED_REF_ALT true
**********
Feb 20, 2024 1:47:55 PM com.intel.gkl.NativeLibraryLoader load
INFO: Loading libgkl_compression.dylib from jar:file:/Users/nantic/picard.jar!/com/intel/gkl/native/libgkl_compression.dylib
Feb 20, 2024 1:47:55 PM com.intel.gkl.NativeLibraryLoader load
WARNING: Unable to load libgkl_compression.dylib from native/libgkl_compression.dylib (Can't load library: /var/folders/bz/m4dqd70d2yg6zdh5dk5qqpz80000gn/T/nantic/libgkl_compression15680425776625712486.dylib)
Feb 20, 2024 1:47:55 PM com.intel.gkl.NativeLibraryLoader load
INFO: Loading libgkl_compression.dylib from jar:file:/Users/nantic/picard.jar!/com/intel/gkl/native/libgkl_compression.dylib
Feb 20, 2024 1:47:55 PM com.intel.gkl.NativeLibraryLoader load
WARNING: Unable to load libgkl_compression.dylib from native/libgkl_compression.dylib (Can't load library: /var/folders/bz/m4dqd70d2yg6zdh5dk5qqpz80000gn/T/nantic/libgkl_compression7756468538825473314.dylib)
[Tue Feb 20 13:47:55 CET 2024] LiftoverVcf INPUT=input.vcf OUTPUT=output.vcf CHAIN=hg19ToHg38.over.chain REJECT=rejected.vcf RECOVER_SWAPPED_REF_ALT=true REFERENCE_SEQUENCE=ref_GRCh38_Homo_sapiens_assembly38.fasta WARN_ON_MISSING_CONTIG=false LOG_FAILED_INTERVALS=true WRITE_ORIGINAL_POSITION=false WRITE_ORIGINAL_ALLELES=false LIFTOVER_MIN_MATCH=1.0 ALLOW_MISSING_FIELDS_IN_HEADER=false TAGS_TO_REVERSE=[AF] TAGS_TO_DROP=[MAX_AF] DISABLE_SORT=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Tue Feb 20 13:47:55 CET 2024] Executing as nantic on Mac OS X 14.2.1 aarch64; OpenJDK 64-Bit Server VM 17.0.5+8; Deflater: Jdk; Inflater: Jdk; Provider GCS is available; Picard version: 3.1.1
INFO 2024-02-20 13:47:55 LiftoverVcf Loading up the target reference genome.
INFO 2024-02-20 13:48:00 LiftoverVcf Lifting variants over and sorting (not yet writing the output file.)
INFO 2024-02-20 13:48:00 LiftoverVcf Processed 2 variants.
INFO 2024-02-20 13:48:00 LiftoverVcf 0 variants failed to liftover.
INFO 2024-02-20 13:48:00 LiftoverVcf 1 variants lifted over but had mismatching reference alleles after lift over.
INFO 2024-02-20 13:48:00 LiftoverVcf 50.0000% of variants were not successfully lifted over and written to the output.
INFO 2024-02-20 13:48:00 LiftoverVcf liftover success by source contig:
INFO 2024-02-20 13:48:00 LiftoverVcf chr1: 1 / 1 (100.0000%)
INFO 2024-02-20 13:48:00 LiftoverVcf chr2: 0 / 1 (0.0000%)
INFO 2024-02-20 13:48:00 LiftoverVcf lifted variants by target contig:
INFO 2024-02-20 13:48:00 LiftoverVcf chr1: 1
INFO 2024-02-20 13:48:00 LiftoverVcf 0 variants were lifted by swapping REF/ALT alleles.
INFO 2024-02-20 13:48:00 LiftoverVcf Writing out sorted records to final VCF.
[Tue Feb 20 13:48:00 CET 2024] picard.vcf.LiftoverVcf done. Elapsed time: 0.09 minutes.
Runtime.totalMemory()=4294967296
-
Hi Nikola Antić , essentially you'll want to:
- Fork the repo
- Clone the fork to your system
- Create a new branch for your feature
- Make your changes
- Push your changes back to your fork
- Create a PR against the main branch
You might find the detailed instructions here (gatk repo, but pretty much the same for picard) useful: https://github.com/broadinstitute/gatk?tab=readme-ov-file#how-to-contribute-to-gatk
The main Picard repo can be found at: https://github.com/broadinstitute/picard -
Can Kockan this readme was what i was looking for, thank you so much!
Please sign in to leave a comment.
2 comments