Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

ASEReadCounter ouputs only header

0

10 comments

  • Avatar
    Genevieve Brandt (she/her)

    Hi Chunyang,

    Could you confirm if your BAM file contains proper read groups?

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Chunyang Bao

    Yes! I checked the RG info in my bam files and show an example as follows. Furthermore, my bam files worked well with other GATK tools, e.g.mutect2 and Haplotypecaller.

    @RG ID:H55KF.2 SM:XXX LB:NormPond-613230 PL:illumina PU:H55KFALXX161210.2.AGGATCTA-CGAAGTTC CN:BI DT:2016-12-12T00:00:00-0501
    @RG ID:H7LM3.1 SM:XXX LB:NormPond-613230 PL:illumina PU:H7LM3ALXX170110.1.AGGATCTA-CGAAGTTC CN:BI DT:2017-01-14T00:00:00-0501
    @RG ID:H7LM3.2 SM:XXX 1 LB:NormPond-613230 PL:illumina PU:H7LM3ALXX170110.2.AGGATCTA-CGAAGTTC CN:BI DT:2017-01-14T00:00:00-0502
    ...

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Thanks for confirming that Chunyang Bao!

    Are you working with a normal VCF? I see this warning:

    19:13:27.118 WARN ASEReadCounter - Ignoring site: variant is not het at postion: 1:835133

    Could you share some example variants from the VCF? Please also check that the interval file, bam, and VCF overlap as expected. I noticed you have hg19 and b37, do these files have consistent naming?

    Best,

    Genevieve

     

    0
    Comment actions Permalink
  • Avatar
    Chunyang Bao

    I ran it on a tumor BAM and a 1000g VCF as follows ...

    gs://gcp-public-data--broad-references/hg19/v0/1000G_phase1.snps.high_confidence.b37.vcf.gz

    Thanks,

    Chunyang

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Could you paste the example variants here? And please also let me know about the consistent naming, and that you have made sure there are intervals of overlap.

    0
    Comment actions Permalink
  • Avatar
    Chunyang Bao

    Sure!

     

    Example variants from "gs://gcp-public-data--broad-references/hg19/v0/1000G_phase1.snps.high_confidence.b37.vcf.gz"

    1 51479 rs116400033 T A 11726.81 PASS AC=229;AF=0.3253;AN=704;BaseQRankSum=-6.949;DB;DP=1570;Dels=0.00;FS=3.130;HRun=0;HaplotypeScore=0.1377;InbreedingCoeff=0.2907;MQ=34.37;MQ0=174;MQRankSum=1.476;QD=16.08;ReadPosRankSum=-0.202;SB=-4317.78;VQSLOD=5.1635;pop=EUR.admix
    1 55367 . G A 207.20 PASS AC=2;AF=0.00117;AN=1714;BaseQRankSum=2.243;DP=4926;Dels=0.00;FS=3.005;HRun=0;HaplotypeScore=0.1382;InbreedingCoeff=-0.0188;MQ=45.57;MQ0=365;MQRankSum=0.185;QD=21.22;ReadPosRankSum=0.136;SB=-111.01;VQSLOD=6.3979;pop=ALL
    1 55388 . C T 95.61 PASS AC=1;AF=0.00056;AN=1792;BaseQRankSum=-0.038;DP=5282;Dels=0.00;FS=0.000;HRun=2;HaplotypeScore=0.1980;InbreedingCoeff=-0.0278;MQ=48.19;MQ0=20;MQRankSum=-0.397;QD=18.13;ReadPosRankSum=-0.945;SB=-59.46;VQSLOD=5.7297;pop=ALL

     

    Intervals:
    1 10001 177417 + interval-1
    1 227418 267719 + interval-2
    1 317720 471368 + interval-3
    1 521369 2634220 + interval-4
    1 2684221 3845268 + interval-5
    1 3995269 13053050 + interval-6
    1 13102999 13219912 + interval-7
    1 13319913 13557162 + interval-8
    1 13607163 17125658 + interval-9
    1 17175659 29878082 + interval-10

     

    Bam file:

    H7LM3ALXX170114:5:2210:2940:23952 1107 1 51479 40 151M =51404 -226 TTTATGCTACTGTACCTCTGGGATTAATTGCTCTTTCCCTCATTGGCCAGTCACTCTTAGTGTGTGATTAATGCCTGAGACTGTGTGAAGTAAGAGATGGATCAGAGGCCGGGCGCGGGGGCTCGCGCCTGTCATCCCAGCACTTTGGGAG ;579;<=::=<;;9;;;=<9*;9:.99<:<=;;<::<<=:;98:<<;;<7;:-=:=<;<;<;<;<;;<;99<<<=<:7:9=:;<;9;:;:8:;:9:8;;:89;;:::::5:::5:5:::::;95:5:9:96/96899999:9;::::::;;MC:Z:151M MD:Z:151 PG:Z:MarkDuplicates.Y.8P RG:Z:H7LM3.5 NM:i:0 MQ:i:55 OQ:Z:A-<<<JJAFJJJJFFFAJJA-JAA-FFJFJJJFJF<JJJAFF<AJJFFJAJA-JFJJJJJJJJJJJJJJFFJJJJJF7FFJFJJJFJJJJFJJJAJFJJJAFJJJJJJJJJJJJJJJJJJJJJJJJJJJJF-JFJJJJJJJJJJJJFFFAA UQ:i:0 AS:i:151
    H7LM5ALXX170114:4:1221:7923:23337 83 1 51479 40 151M =51404 -226 TTTATGCTACTGTACCTCTGGGATTAATTGCTCTTTCCCTCATTGGCCAGTCACTCTTAGTGTGTGATTAATGCCTGAGACTGTGTGAAGTAAGAGATGGATCAGAGGCCGGGCGCGGGGGCTCGCGCCTGTCATCCCAGCACTTTGGGAG <989;<<;;<<;;;<<;<<<<::;;;;<;<<;<<<;<<<;<;<<<<<<<;;<;<;<<;<;<;<;<;;<;;;<<<<<;<;;<<;<;<;:;:::;:;:8;;::::;:::::5:::5:5::::::95:5:9998898899999:9:::::::;;MC:Z:151M MD:Z:151 PG:Z:MarkDuplicates.R.7Q RG:Z:H7LM5.4 NM:i:0 MQ:i:55 OQ:Z:JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFFAA UQ:i:0 AS:i:151

     

     

     

     

     

     

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Chunyang Bao, I'm still not clear if you have verified that there is overlap between your variants, intervals, and bam file. Also, that the naming of chromosomes is consistent between the files. What did you find regarding this?

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Hi Chunyang Bao, no need to supply that information, I believe we have located the issue causing the tool to only output the header for you. Your VCF is a sites-only VCF and has no genotype information. This tool needs genotype fields in the VCF for the calculation. You can see the caveat in the tool docs:

    • This tool will only process biallelic het SNP sites. If your callset contains multiallelic sites, they will be ignored. Optionally, you can subset your callset to just biallelic variants using e.g. SelectVariants with the option "-restrictAllelesTo BIALLELIC".

    I have created an issue ticket to add in a check for the tool so the issue is easier to spot next time: https://github.com/broadinstitute/gatk/issues/7327

    Best,

    Genevieve

    0
    Comment actions Permalink
  • Avatar
    Chunyang Bao

    Yes, you are right! Genotype information is required. Thank you!

    0
    Comment actions Permalink
  • Avatar
    Genevieve Brandt (she/her)

    Chunyang Bao the developers have added the check so that there is a better warning message in the future and it will be available in the next GATK release: https://github.com/broadinstitute/gatk/pull/7326

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk