Picard CrosscheckFingerprints not identifying samples from the same individual as a MATCH
AnsweredI am trying to run CrosscheckFingerprints on multiple BAM files from the same individual to confirm all BAM files are likely from the same subject. I'm following the example in the section with this description: Check that all the readgroups match as expected when providing reads from two samples from the same individual.
When I run the command with two samples I know are from different individuals, the output correctly identifies them as a MISMATCH. However, when I run the command with samples from the same individual, the comparison is also identified as a MISMATCH, but the LOD scores are closer to 0. The LOD scores I get for known mismatches are roughly between -1400 and -1600. The LOD scores I get for comparisons between different samples from the same individual are between -500 and -100.
I am using bulk RNAseq data and the pre-computed haplotype map from https://github.com/naumanjaved/fingerprint_maps.
Can someone help troubleshoot why expected MATCHES are performing better than known MISMATCHES but are still not being identified as MATCHES or even INCONCLUSIVE?
Required information is below.
Thank you in advance,
Chris
REQUIRED for all errors and issues:
a) GATK version used: Picard version 2.26.11-SNAPSHOT
b) Exact command used:
java -jar build/libs/picard.jar CrosscheckFingerprints \
-INPUT ../bam/F4196374_1001.bam \
-INPUT ../bam/F4196593_1001.bam \
-HAPLOTYPE_MAP hg38_nochr.map \
-LOD_THRESHOLD -5 \
-CROSSCHECK_BY FILE \
-EXPECT_ALL_GROUPS_TO_MATCH true \
-OUTPUT all_files.crosscheck_metrics
c) Entire program log:
### log to stdout ###############
14:35:42.550 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/domino/datasets/local/RSEM_Index/picard/build/libs/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Mar 23 14:35:42 UTC 2022] CrosscheckFingerprints --INPUT ../bam/F4196374_1001.bam --INPUT ../bam/F4196593_1001.bam --OUTPUT all_files.crosscheck_metrics --HAPLOTYPE_MAP hg38_nochr.map --LOD_THRESHOLD -5.0 --CROSSCHECK_BY FILE --EXPECT_ALL_GROUPS_TO_MATCH true --CROSSCHECK_MODE CHECK_SAME_SAMPLE --NUM_THREADS 1 --CALCULATE_TUMOR_AWARE_RESULTS true --ALLOW_DUPLICATE_READS false --GENOTYPING_ERROR_RATE 0.01 --OUTPUT_ERRORS_ONLY false --LOSS_OF_HET_RATE 0.5 --EXIT_CODE_WHEN_MISMATCH 1 --EXIT_CODE_WHEN_NO_VALID_CHECKS 1 --MAX_EFFECT_OF_EACH_HAPLOTYPE_BLOCK 3.0 --TEST_INPUT_READABILITY true --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Wed Mar 23 14:35:42 UTC 2022] Executing as ubuntu@run-623b1e4d8755b21bf46d1694-vspqd on Linux 5.4.162-86.275.amzn2.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.26.11-SNAPSHOT
INFO 2022-03-23 14:35:42 CrosscheckFingerprints Fingerprinting 2 INPUT files.
INFO 2022-03-23 14:35:42 FingerprintChecker Reading an indexed file (file:///domino/datasets/local/RSEM_Index/picard/../bam/F4196374_1001.bam)
INFO 2022-03-23 14:38:15 FingerprintChecker Reading an indexed file (file:///domino/datasets/local/RSEM_Index/picard/../bam/F4196593_1001.bam)
INFO 2022-03-23 14:40:58 FingerprintChecker Processed files. 2 fingerprints found in map.
INFO 2022-03-23 14:40:59 CrosscheckFingerprints Cross-checking all FILE against each other
WARNING 2022-03-23 14:40:59 CrosscheckFingerprints 2 FILEs did not relate as expected.
[Wed Mar 23 14:40:59 UTC 2022] picard.fingerprint.CrosscheckFingerprints done. Elapsed time: 5.29 minutes.
Runtime.totalMemory()=11362893824
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
##### output file #################
## htsjdk.samtools.metrics.StringHeader
# CrosscheckFingerprints --INPUT ../bam/F4196374_1001.bam --INPUT ../bam/F4196593_1001.bam --OUTPUT all_files.crosscheck_metrics --HAPLOTYPE_MAP hg38_nochr.map --LOD_THRESHOLD -5.0 --CROSSCHECK_BY FILE --EXPECT_ALL_GROUPS_TO_MATCH true --CROSSCHECK_MODE CHECK_SAME_SAMPLE --NUM_THREADS 1 --CALCULATE_TUMOR_AWARE_RESULTS true --ALLOW_DUPLICATE_READS false --GENOTYPING_ERROR_RATE 0.01 --OUTPUT_ERRORS_ONLY false --LOSS_OF_HET_RATE 0.5 --EXIT_CODE_WHEN_MISMATCH 1 --EXIT_CODE_WHEN_NO_VALID_CHECKS 1 --MAX_EFFECT_OF_EACH_HAPLOTYPE_BLOCK 3.0 --TEST_INPUT_READABILITY true --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
## htsjdk.samtools.metrics.StringHeader
# Started on: Wed Mar 23 14:35:42 UTC 2022
## METRICS CLASS picard.fingerprint.CrosscheckMetric
LEFT_GROUP_VALUE RIGHT_GROUP_VALUE RESULT DATA_TYPE LOD_SCORE LOD_SCORE_TUMOR_NORMAL LOD_SCORE_NORMAL_TUMOR LEFT_RUN_BARCODE LEFT_LANE LEFT_MOLECULAR_BARCODE_SEQUENCE LEFT_LIBRARY LEFT_SAMPLE LEFT_FILE RIGHT_RUN_BARCODE RIGHT_LANE RIGHT_MOLECULAR_BARCODE_SEQUENCE RIGHT_LIBRARY RIGHT_SAMPLE RIGHT_FILE
file:///domino/datasets/local/RSEM_Index/picard/../bam/F4196374_1001.bam::Sample1 file:///domino/datasets/local/RSEM_Index/picard/../bam/F4196374_1001.bam::Sample1 EXPECTED_MATCH FILE 2657.792003 2091.069794 2091.069794 ? -1 ? 1 Sample1 file:///domino/datasets/local/RSEM_Index/picard/../bam/F4196374_1001.bam ? -1 ? 1 Sample1 file:///domino/datasets/local/RSEM_Index/picard/../bam/F4196374_1001.bam
file:///domino/datasets/local/RSEM_Index/picard/../bam/F4196374_1001.bam::Sample1 file:///domino/datasets/local/RSEM_Index/picard/../bam/F4196593_1001.bam::Sample2 UNEXPECTED_MISMATCH FILE -562.074522 -194.252635 -257.223569 ? -1 ? 1 Sample1 file:///domino/datasets/local/RSEM_Index/picard/../bam/F4196374_1001.bam ? -1 ? 1 Sample2 file:///domino/datasets/local/RSEM_Index/picard/../bam/F4196593_1001.bam
file:///domino/datasets/local/RSEM_Index/picard/../bam/F4196593_1001.bam::Sample2 file:///domino/datasets/local/RSEM_Index/picard/../bam/F4196374_1001.bam::Sample1 UNEXPECTED_MISMATCH FILE -562.074522 -257.223569 -194.252635 ? -1 ? 1 Sample2 file:///domino/datasets/local/RSEM_Index/picard/../bam/F4196593_1001.bam ? -1 ? 1 Sample1 file:///domino/datasets/local/RSEM_Index/picard/../bam/F4196374_1001.bam
file:///domino/datasets/local/RSEM_Index/picard/../bam/F4196593_1001.bam::Sample2 file:///domino/datasets/local/RSEM_Index/picard/../bam/F4196593_1001.bam::Sample2 EXPECTED_MATCH FILE 3187.94952 2504.653897 2504.653897 ? -1 ? 1 Sample2 file:///domino/datasets/local/RSEM_Index/picard/../bam/F4196593_1001.bam ? -1 ? 1 Sample2 file:///domino/datasets/local/RSEM_Index/picard/../bam/F4196593_1001.bam
-
Hi Christopher Sisk,
Thanks for writing in about this question! Hopefully we can help to figure out why you are seeing this.
First, something to know, is that contaminated samples can erroneously be labeled as mismatching, so there might be contamination in one of your samples. We recommend trying verifyBAMid to check. If there is contamination, you might be able to extract a cleaner fingerprint with ExtractFingerprint. If both samples are contaminated, this might not work as well, though, because the local relative coverage could differ between the two original (uncontaminated) samples.
You can try to manually examine the SNPs to determine which are causing the negative LOD score. One way to do that is to run ExtractFingerprint on one of the samples to get a VCF then CheckFingerprint, which gives a metrics file.
You can also examine the samples in IGV and walk through the fingerprinting sites. You can then see where the genotypes are the same in both samples and different.
Let me know if you have any further questions and if this ends up working for you.
Best,
Genevieve
Please sign in to leave a comment.
1 comment