How to use WGA Data in picard
Answered
Additional info:
a) The reference genome and the SRA are different strains of the same subspecies
b) Reference genome used
https://www.ncbi.nlm.nih.gov/assembly/GCA_900231445.1/
c) SRA File used (Illumina paired end reads)
https://www.ncbi.nlm.nih.gov/sra/SRX688079[accn]
REQUIRED for all errors and issues:
a) GATK version used:
Not sure, I think the most recent?
b) Exact command used:
java -jar /clusterfs/vector/home/groups/software/sl-7.x86_64/modules/picard/2.9.0/lib/picard.jar SortSam I=SRR1559585.extrapreprocessed O=SRR1559585.sorted SORT_ORDER=coordinate
c) Entire program log:
[Tue May 03 16:00:00 PDT 2022] picard.sam.SortSam INPUT=SRR1559585.extrapreprocessed OUTPUT=SRR1559585.sorted SORT_ORDER=coordinate VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Tue May 03 16:00:00 PDT 2022] Executing as marcusvarni@ln001.brc on Linux 3.10.0-1160.31.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13; Picard version: 2.9.0-SNAPSHOT
[Tue May 03 16:00:00 PDT 2022] picard.sam.SortSam done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=1011351552
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 25875, Read name SRR1559585.12924, Insert size out of range
at htsjdk.samtools.SAMUtils.processValidationErrors(SAMUtils.java:448)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:796)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:781)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:751)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:569)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:543)
at picard.sam.SortSam.doWork(SortSam.java:99)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
d) What I tried:
- I first tried cleaning the file up after checking it's validity using ValidateSamFile
-
java -jar /clusterfs/vector/home/groups/software/sl-7.x86_64/modules/picard/2.9.0/lib/picard.jar CleanSam\ I=SRR1559585.preprocessed \ O=SRR1559585.extrapreprocessed
- I then tried to do ValidateSamFile (I got the same issue prior to cleaning so I did not include the first ValidateSamFile)
-
java -jar /clusterfs/vector/home/groups/software/sl-7.x86_64/modules/picard/2.9.0/lib/picard.jar ValidateSamFile \
I=SRR1559585.extrapreprocessed \
MODE=SUMMARY - Here is the result
-
(base) [marcusvarni@ln001 SvevoSVCBackup]$ java -jar /clusterfs/vector/home/groups/software/sl-7.x86_64/modules/picard/2.9.0/lib/picard.jar ValidateSamFile \
> I=SRR1559585.extrapreprocessed \
> MODE=SUMMARY
[Tue May 03 16:02:27 PDT 2022] picard.sam.ValidateSamFile INPUT=SRR1559585.extrapreprocessed MODE=SUMMARY MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Tue May 03 16:02:27 PDT 2022] Executing as marcusvarni@ln001.brc on Linux 3.10.0-1160.31.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_121-b13; Picard version: 2.9.0-SNAPSHOT
INFO 2022-05-03 16:03:22 SamFileValidator Validated Read 10,000,000 records. Elapsed time: 00:00:54s. Time for last 10,000,000: 54s. Last read position: LT934114.1:752,073,531
INFO 2022-05-03 16:04:15 SamFileValidator Validated Read 20,000,000 records. Elapsed time: 00:01:47s. Time for last 10,000,000: 53s. Last read position: LT934117.1:577,404,702
INFO 2022-05-03 16:05:08 SamFileValidator Validated Read 30,000,000 records. Elapsed time: 00:02:40s. Time for last 10,000,000: 53s. Last read position: LT934116.1:302,764,743
INFO 2022-05-03 16:06:01 SamFileValidator Validated Read 40,000,000 records. Elapsed time: 00:03:33s. Time for last 10,000,000: 52s. Last read position: LT934114.1:26,117,261
INFO 2022-05-03 16:06:57 SamFileValidator Validated Read 50,000,000 records. Elapsed time: 00:04:29s. Time for last 10,000,000: 56s. Last read position: LT934112.1:95,106,473
INFO 2022-05-03 16:07:51 SamFileValidator Validated Read 60,000,000 records. Elapsed time: 00:05:23s. Time for last 10,000,000: 54s. Last read position: LT934119.1:576,450,723
INFO 2022-05-03 16:08:44 SamFileValidator Validated Read 70,000,000 records. Elapsed time: 00:06:16s. Time for last 10,000,000: 52s. Last read position: LT934123.1:415,133,202
## HISTOGRAM java.lang.String
Error Type Count
ERROR:INVALID_INSERT_SIZE 7166
ERROR:MISSING_PLATFORM_VALUE 1
[Tue May 03 16:09:12 PDT 2022] picard.sam.ValidateSamFile done. Elapsed time: 6.74 minutes.
Runtime.totalMemory()=824180736
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
-
Thank you for your post, Marcus Varni! I want to let you know we have received your question. We'll get back to you if we have any updates or follow up questions.
Please see our Support Policy for more details about how we prioritize responding to questions.
-
Awesome, thank you!
-
Hi Marcus Varni,
Could you clarify what version of Picard you are using and the tlen of the record with the error (Record 25875, Read name SRR1559585.12924)?
We've seen some previous issues come up with other users who are also analyzing wheat, but those issues were with creating an output index in HaplotypeCaller, not Picard. So, this might have something to do with the size of your contigs, but I'm not sure just yet. Here are the other issues:
- https://gatk.broadinstitute.org/hc/en-us/community/posts/360075181171-HaplotypeCaller-Shutting-down-engine-Encountering-a-large-genome
- https://gatk.broadinstitute.org/hc/en-us/community/posts/4407400443803-GenomicsDBimport-and-CombineGVCF-does-not-show-variants-at-500-Mbp-onwards-although-gvcf-files-from-HapolypeCaller-report-variants
- https://gatk.broadinstitute.org/hc/en-us/community/posts/360075391631-SplitNCigarReads-generating-truncated-bam-files
Let us know what you find.
Best,
Genevieve
Please sign in to leave a comment.
3 comments