Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

CalculateContamination "there is no such column: contig"

1

8 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Jensen Richardson, what was the GetPileupSummaries command that you used? Here is the documentation. In that document, there is an example of the expected output, which is a table of 6 columns, starting with contig. Is that file as expected? If so, could you send us the command you used and the first 5 lines of that file?

    0
    Comment actions Permalink
  • Avatar
    Jensen Richardson

    Hi Genevieve,

    The GetPileupSummaries (or rather just Pileup) command that I used was:

    ./req-files/gatk/gatk Pileup -R ./req-files/refGenome/GRCh38.p7.genome.fa -I ./07-ApplyBaseRecalibration/LP1_DSMZ_p9_CL_Whole_T1_A1S4U_K02371_D0RWMACXX_CAGATC.recalibrated.bam -O ./09-PileupSummary/LP1_DSMZ_p9_CL_Whole_T1_A1S4U_K02371_D0RWMACXX_CAGATC.pileup.table 1>&2 2>./09-PileupSummary/LP1_DSMZ_p9_CL_Whole_T1_A1S4U_K02371_D0RWMACXX_CAGATC.pileup.gatk.log

    I see that instead of using GetPileupSummaries I actually used Pileup. I will rerun it using the correct tool (that might help...) While investigating how I may have made this mistake I did notice that on the main gatk tool documentation index that the link for GetPileupSummaries actually links to Pileup, not the GetPileupSummaries documentation page.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Jensen Richardson, yes, I think it will work if you use GetPileupSummaries because they produce different outputs. And thank you for pointing out that link problem, we will get that fixed. Let me know if using GetPileupSummaries works!

    0
    Comment actions Permalink
  • Avatar
    Jensen Richardson

    I am now trying to use GetPileupSummaries, but I keep getting the same error:

    A USER ERROR has occurred: Badly formed genome unclippedLoc: Contig 1 given as location, but this contig isn't present in the Fasta sequence dictionary

    From what I've read online it seems like this often comes from not correctly matching the reference genome across all steps of the pipeline, but I have been very careful to do that.

    I am using dbSNP common variants from here, and I am using build 151 of dbSNP. It says that it is for GRCh38.p7 in the header of the file, but I am still getting this problem even though I have been using GRCh38.p7 for the whole pipeline. I cannot fathom why it would be unable to find the contig if I have been using the same reference for the whole time. My current theory is it comes from the VCF file being formatted like this (I removed most of the info column because it doesn't seem necessary):

    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO

    1       10177   rs367896724     A       AC      .       .       RS=367896724;RSPOS=10177;dbSNPBuildID=138

    1       10352   rs555500075     T       TA      .       .       RS=555500075;RSPOS=10352;dbSNPBuildID=142

    1       10616   rs376342519     CCGCCGTTGCAAAGGCGCGCCG  C       .       .       RS=376342519;RSPOS=10617;dbSNPBuildID=142

    And you can see that the CHROM position contains just a "1" instead of a "chr1" as is common in GRCh38. Do you have any ideas for what could be causing this?

    0
    Comment actions Permalink
  • Avatar
    Jensen Richardson

    So doing something unrelated I stumbled upon this issue on the github which seems to relate to nearly the exact same thing. It seems as if dbSNP is not using the chr prefix, even though that seems to be thought (at least by Broad) as the standard for GRCh38.

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Jensen Richardson thank you for the update. Here the link to the issue you pointed out with the broken hyperlinks, so you can follow along: https://github.com/broadinstitute/gatk/issues/6699. Could you post your entire GetPileupSummaries command?

    0
    Comment actions Permalink
  • Avatar
    Jensen Richardson

    Thank you for the link to the issue. Here is the GetPileupSummaries command that I was using:

    ./req-files/gatk/gatk GetPileupSummaries \

    -I ./07-ApplyBaseRecalibration/LP1_DSMZ_p9_CL_Whole_T1_A1S4U_K02371_D0RWMACXX_CAGATC.recalibrated.bam \

    -V ./req-files/common_var/00-common_all.vcf \

    -L ./req-files/common_var/00-common_all.vcf \

    -O ./09-PileupSummary/LP1_DSMZ_p9_CL_Whole_T1_A1S4U_K02371_D0RWMACXX_CAGATC.pileup.table \

    1>&2 2>./09-PileupSummary/LP1_DSMZ_p9_CL_Whole_T1_A1S4U_K02371_D0RWMACXX_CAGATC.pileup.gatk.log

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thank you for the quick response. I confirmed with our developers that unfortunately this is an issue with dbSNP and is not on our end. As you linked to before, we are working on a fix that will be able to be used with funcotator, so you can stay tuned to this. However, you may have to find a workaround with your data to get dbSNP to match your reference. Just be careful with the alternative contig names and that you are keeping everything correct.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk