Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Picard CheckFingerprint output for mismatches

0

4 comments

  • Avatar
    Ricky Magner

    Hi, thanks for posting. I tried recreating your issue with some of my own data but couldn't reproduce it. In particular, I used the same flags as you with a BAM I knew mismatched my VCF sample and tried running. It produced a large negative LOD (i.e. fingerprint mismatch) as expected. 

    Can you share a bit more information about your data and environment? In particular:

    • Version of Picard/GATK
    • The output of `bcftools query -l` on your VCF
    • The RG info view `samtools view -H <bam> | grep "@RG"` 

    Thanks!

     

    0
    Comment actions Permalink
  • Avatar
    Hans-Ulrich Klein

    Hi Ricky,

    Thank you for your reply. Initially, I used an older version of Picard, but I observed the exact same result with Picard 3.1.1.

    The given EXPECTED_SAMPLE_ALIAS is a valid name in my VCF file. There are 1200 samples in the VCF file, so I won't list all of them here, but everything appears fine to me:

    > query -l rosmapWgsFingerprints.vcf.gz | grep SM-CJGLP 
    SM-CJGLP

    The bam file header might be the problem since it does not have the SM tag:

    samtools view -H $bamfull | grep "@RG"
    @RG    ID:76755449_SMA    CN:BI    DT:2024-03-11T18:21:48:-0400

    However, it works when I specify the correct sample in the VCF file (high LOD). It also works when I specify the correct bam file belonging to the sample "SM-CJGLP" in the VCF file.

    Best,
    Hans

    0
    Comment actions Permalink
  • Avatar
    Ricky Magner

    Hi,

    I'm still trying to test this a bit, but can you confirm:

    • If you add a "SM" field to the RG line, does it produce the same error?
    • What do you mean when you say it "works" when specifying the correct bam for the sample in the VCF file? You mean it runs properly when using a different bam but the same VCF? If so, can you also print the RG lines from that bam so I can see?

    Thanks

    0
    Comment actions Permalink
  • Avatar
    Ricky Magner

    I did take a look at the code, and it does seem like there's a step early on where the SM tag value is extracted from the RG and fed to a bunch of functions later on, most likely leading to your `null / null` output, as opposed to e.g. I saw `Read Group: null / SM-MOKH1 vs. HG002` in my log (for my files). In that sense I'm optimistic adding an SM value to your bam RGs would "fix" this issue, but if it does work it'd certainly be a bug in the tool we'd have to look into.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk