Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Calling variants on chromosome Y

2

7 comments

  • Avatar
    Derek Caetano-Anolles

    Hi Damian, I don't believe that HaplotypeCaller plays very nice with chrY, so it may be most expedient to just exclude them from the female analysis (if you're certain they don't actually have Y chromosomes...).

    Would you mind checking the female samples to see if there are any reads that map to chrY, how many variant calls you get, and also which regions they are in? You need to make sure your reads are not in autosomal regions. If your female samples aren't getting hits, then we don't really have an explanation aside from possible contamination.

    1
    Comment actions Permalink
  • Avatar
    Damian Fermin

    Hi Derek

    Thanks for getting back to me.

     

    I've checked the BAM files for each patient looking at the average read depth of the gene SRY. It's on the Y chromosome and should not have read counts for the females. 

    I used this command:


    samtools depth -r chrY:2786840-2787751 --reference  hg38.ref/Homo_sapiens_assembly38.fasta sample_id.recal.bam -q 0 -Q 0 > sample_id.SRY.readdepth.txt

    This was run on the "analysis-ready" BAM files produced using best practices.

    The male samples had an average read depth of about 20X in that region. 

    The female samples had zero reads. 

    I checked the VCF files and can see that coordinates encompassed by the pseudo-autosomal regions are not represented in them. 

     

     

    1
    Comment actions Permalink
  • Avatar
    Derek Caetano-Anolles
    1
    Comment actions Permalink
  • Avatar
    Damian Fermin

     

    So I looked at the BAM files for the sites with the stray variants. They are definitely getting a lot of reads supporting their call (some in excess of 100). 

    Extracting about 20 of these aligned reads at random I BLAST'ed them against the genome. Their single best alignment is on the Y-chromosome. So BWA is doing it's job. 

    I looked up the regions were many of the reads are aligning on the Y-chromosome and found that most of these reads are aligning to pseudo genes on the Y-chromosome. Many of these pseudo genes are also near the centromere. 

    I was thinking: should I exclude reads aligned to the pseudo genes of chromosome Y from the BAM files and then run the genotyping part of the pipeline to see if this makes a difference? 

     

     

    1
    Comment actions Permalink
  • Avatar
    Derek Caetano-Anolles

    "I was thinking: should I exclude reads aligned to the pseudo genes of chromosome Y from the BAM files and then run the genotyping part of the pipeline to see if this makes a difference?"

    I don't think it would hurt to try. This seems like one of those cases where, if you're positive that your female samples don't have a Y chromosome (and thus, the reads you're getting are confirmed as false positives), it is acceptable to exclude them from your analysis.

    I would be more concerned that the similar hits you are getting from your male samples may be false positives as well.

    1
    Comment actions Permalink
  • Avatar
    Layne Sadler

    This is what I see for the chrX genotypes of a randomly selected male

    GT:count
    1/1:1702
    0/1:371
    1/2:14

    Consider that GQ>=15 means 95% confidence

    ---

    chrY is worse because the majority are labeled het


    GT:count
    0/1:141
    1/1:29

    ---

    This makes me question everything. It's disappointing that there isn't even a warning issued about ploidy options. Perhaps some of these het are in the PAR region.

    GATK v4.4.0.0 w default HaplotypeCaller options

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Hi Layne Sadler

    GATK HaplotypeCaller engine currently does not have direct options for specifying allosomal chromosomes. Currently not very many genotyping tools have these options available directly If you wish to get hemizygous calls from non-PAR regions our recommendation will be to use an interval-list to specify regions for calling with ploidy 1. Of course this call should be made separately from the rest of the chromosomes and should be merged later. Also keep in mind that this configuration may require additional handling on the joint genotyping side where you may need to handle male and female subjects separately and combining the whole data may need additional tinkering. 

    Regards. 

    0
    Comment actions Permalink

Post is closed for comments.

Powered by Zendesk